The CC RSI-Returns gives the CC for certain lags. Our model actually uses the sequence RSI[t-20, t] (and probably also single values) to predict the mean(Returns[t+1, t+6]), so not quite this CC, but maybe a better CC can be engineered. Nonetheless, i asked ChatGPT:
i noticed that the cross correlation between RSI and the returns does actually contain information. it first goes continuously down to -0.01 for lags of 6, then continuously goes up to 0.007 for lags of 20. so it seems to have predictive power, but i am wondering why the LSTM does not capture it.
ChatGPT Answer
Great — that cross-correlation pattern is exactly the sort of weak but structured signal LSTMs should be able to exploit, so it’s useful you found it. If the LSTM still isn’t picking it up, the culprit is almost always one (or more) of these: temporal windowing, data alignment, signal-to-noise, model / training config, or evaluation method. Below I’ll give a short diagnosis checklist and then a precise, ordered set of experiments you can run (with code snippets) that will tell you exactly what’s going wrong and — in most cases — fix it.
Quick checklist (why LSTM may miss the lagged RSI→return relation)
Lookback too short. Your cross-correlation shows useful lags up to ~20. If lookb < 20 the model can’t see the signal.
Target alignment. Are you predicting the correct future return relative to the RSI lags you measured? Make sure target = return at t+τ while RSI lags are t−k as intended.
Feature representation. A single RSI channel may be hard to extract lags from — explicitly adding lagged RSI features often helps.
Signal magnitude vs noise. Correlations ~0.01 are tiny. If the target variance is dominated by other effects, the network will ignore the tiny signal.
Model / training issues. Wrong final activation (sigmoid), poor LR, too strong regularization, batch shuffling, or insufficient capacity for that specific temporal pattern.
No baseline test. If a simple linear regressor on lagged RSI can’t capture the effect, the problem is in data/feature engineering rather than the network.
Experiments to run (ordered: fast → more involved)
A — Verify the signal with a linear baseline
This tells you if the signal is actually usable by a simple model.
build matrix of RSI lags
rsi: shape (N,), returns: shape (N,)
max_lag = 30
X = np.vstack([np.roll(rsi, k) for k in range(1, max_lag+1)]).T # shape (N, max_lag)
y = returns # properly aligned (shifted so features predict later returns)
trim rows with roll artifacts
X = X[max_lag:]
y = y[max_lag:]
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
Xtr, Xv, ytr, yv = train_test_split(X, y, test_size=0.2, shuffle=False)
r = Ridge(alpha=1.0).fit(Xtr, ytr)
print("Ridge R^2 on val:", r.score(Xv, yv))
import numpy as np
print("Corr(pred,y):", np.corrcoef(r.predict(Xv), yv)[0,1])
If the Ridge (or simple OLS) shows nonzero R² or positive corr, the signal is learnable. If not, your LSTM won’t be able to either until you fix the data.
B — Ensure lookback covers lags
If you saw predictive power up to lag 20, set lookb >= 20 (better 30–40).
In your LSTM call, use input_shape=(lookb, n_features) and make sure your data slicing produces that full window.
C — Add explicit lagged RSI features (fast, powerful)
Even for LSTMs, providing lagged versions of a small-but-important signal helps the network focus:
def add_lagged_features(features, rsi, maxlag=20):
# features: (N, lookb, n_features)
# rsi: (N, )
N = features.shape[0]
lagged = np.zeros((N, maxlag))
for k in range(maxlag):
lagged[:, k] = np.roll(rsi, k+1) # lag 1..maxlag
# trim top rows as needed or align consistently
# Expand lagged into features axes or append as extra channels
lagged = lagged.reshape(N, 1, maxlag) # or tile to timesteps if needed
# Example: append as extra features at last timestep or broadcast
features = np.concatenate([features, np.tile(lagged, (1, features.shape[1], 1))], axis=2)
return features
This forces the model to see lagged RSI explicitly; it’s often the quickest win.
D — Check alignment & target definition
Make sure X[t] contains RSI up to time t and y[t] is the return you want to predict (e.g., next-hour return). A common bug: off-by-one shifting makes the model never see the predictive RSI at the right moment.
quick check
print("Example RSI window (last timesteps):", X_val[0, -1, rsi_index])
print("Target comparable return:", y_val[0])
E — Small diagnostic LSTM (keep it simple)
Train a very small LSTM only on RSI (and lagged RSI) to see if the architecture can learn the pattern. This isolates other features.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
from tensorflow.keras.optimizers import Adam
X_rsi shape = (N, lookb, 1)
model = Sequential([LSTM(16, input_shape=(lookb,1)), Dense(1, activation='linear')])
model.compile(optimizer=Adam(1e-3), loss='mse')
model.fit(Xtr_rsi, ytr, validation_data=(Xv_rsi, yv), epochs=100, batch_size=32,
callbacks=[EarlyStopping(monitor='val_loss', patience=8, restore_best_weights=True)])
If this small model learns (val loss drops), your full-feature LSTM should too; if not, the problem is data/signal strength.
F — Training tips to surface weak signals
Use linear final activation (no sigmoid) for regression.
Standardize features and target (zero mean, unit std) using scaler fit on train only.
Disable shuffle in .fit() (LSTM requires sequence order): shuffle=False.
Lower LR if unstable: try 1e-3 → 3e-4 → 1e-4.
Gradient clipping: Adam(learning_rate=1e-3, clipnorm=1.0).
Batch size: try 32; tiny batches produce noisy gradients that obscure weak patterns.
EarlyStopping + ReduceLROnPlateau as before.
G — Measure direct sensitivity of model predictions to RSI lags
After training, test sensitivity: for a validation sample, set RSI lag k to zero and see prediction change; or compute input gradients (with tf.GradientTape) to see which timesteps/features the model uses.
Permutation importance example:
base_loss = model.evaluate(Xv, yv, verbose=0)
Xpv = Xv.copy()
permute lag j across samples
Xpv[:,:,rsi_index] = np.random.permutation(Xpv[:,:,rsi_index])
loss_perm = model.evaluate(Xpv, yv, verbose=0)
print("loss increase:", loss_perm - base_loss)
If permuting RSI increases loss, the model uses RSI.
H — If signal is tiny, amplify it for training experiments
Temporarily multiply your synthetic RSI-driven shift by 5–10×. If model then learns, you know it was a SNR issue and can tune to detect smaller effects (longer training, better architecture, ensembling).
Two quick sanity checks you can run now
Linear baseline performance (Ridge / OLS) on lagged RSI must show some predictive power (non-zero corr). If Ridge fails, LSTM isn’t expected to do well either.
Train small LSTM on RSI-only (no extra features) with lookb >= 20. If it learns, then the full model needs different tuning (maybe regularization was hiding the effect).
Summary: a concise plan of action
Confirm lookb >= 20.
Build lagged-RSI features and try a Ridge baseline (fast).
Train a small LSTM on RSI-only (linear final activation, lr=1e-3, batch=32, shuffle=False).
If baseline learns but your big model doesn’t: check regularization (reduce dropout/L2) and scaling; try removing aggressive reg until signal appears.
If baseline fails, increase synthetic signal magnitude to debug, or revisit target alignment.
The CC RSI-Returns gives the CC for certain lags. Our model actually uses the sequence RSI[t-20, t] (and probably also single values) to predict the mean(Returns[t+1, t+6]), so not quite this CC, but maybe a better CC can be engineered. Nonetheless, i asked ChatGPT:
i noticed that the cross correlation between RSI and the returns does actually contain information. it first goes continuously down to -0.01 for lags of 6, then continuously goes up to 0.007 for lags of 20. so it seems to have predictive power, but i am wondering why the LSTM does not capture it.ChatGPT Answer
Great — that cross-correlation pattern is exactly the sort of weak but structured signal LSTMs should be able to exploit, so it’s useful you found it. If the LSTM still isn’t picking it up, the culprit is almost always one (or more) of these: temporal windowing, data alignment, signal-to-noise, model / training config, or evaluation method. Below I’ll give a short diagnosis checklist and then a precise, ordered set of experiments you can run (with code snippets) that will tell you exactly what’s going wrong and — in most cases — fix it.
Quick checklist (why LSTM may miss the lagged RSI→return relation)
Lookback too short. Your cross-correlation shows useful lags up to ~20. If lookb < 20 the model can’t see the signal.
Target alignment. Are you predicting the correct future return relative to the RSI lags you measured? Make sure target = return at t+τ while RSI lags are t−k as intended.
Feature representation. A single RSI channel may be hard to extract lags from — explicitly adding lagged RSI features often helps.
Signal magnitude vs noise. Correlations ~0.01 are tiny. If the target variance is dominated by other effects, the network will ignore the tiny signal.
Model / training issues. Wrong final activation (sigmoid), poor LR, too strong regularization, batch shuffling, or insufficient capacity for that specific temporal pattern.
No baseline test. If a simple linear regressor on lagged RSI can’t capture the effect, the problem is in data/feature engineering rather than the network.
Experiments to run (ordered: fast → more involved)
A — Verify the signal with a linear baseline
This tells you if the signal is actually usable by a simple model.
build matrix of RSI lags
rsi: shape (N,), returns: shape (N,)
max_lag = 30
X = np.vstack([np.roll(rsi, k) for k in range(1, max_lag+1)]).T # shape (N, max_lag)
y = returns # properly aligned (shifted so features predict later returns)
trim rows with roll artifacts
X = X[max_lag:]
y = y[max_lag:]
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
Xtr, Xv, ytr, yv = train_test_split(X, y, test_size=0.2, shuffle=False)
r = Ridge(alpha=1.0).fit(Xtr, ytr)
print("Ridge R^2 on val:", r.score(Xv, yv))
import numpy as np
print("Corr(pred,y):", np.corrcoef(r.predict(Xv), yv)[0,1])
If the Ridge (or simple OLS) shows nonzero R² or positive corr, the signal is learnable. If not, your LSTM won’t be able to either until you fix the data.
B — Ensure lookback covers lags
If you saw predictive power up to lag 20, set lookb >= 20 (better 30–40).
In your LSTM call, use input_shape=(lookb, n_features) and make sure your data slicing produces that full window.
C — Add explicit lagged RSI features (fast, powerful)
Even for LSTMs, providing lagged versions of a small-but-important signal helps the network focus:
def add_lagged_features(features, rsi, maxlag=20):
# features: (N, lookb, n_features)
# rsi: (N, )
N = features.shape[0]
lagged = np.zeros((N, maxlag))
for k in range(maxlag):
lagged[:, k] = np.roll(rsi, k+1) # lag 1..maxlag
# trim top rows as needed or align consistently
# Expand lagged into features axes or append as extra channels
lagged = lagged.reshape(N, 1, maxlag) # or tile to timesteps if needed
# Example: append as extra features at last timestep or broadcast
features = np.concatenate([features, np.tile(lagged, (1, features.shape[1], 1))], axis=2)
return features
This forces the model to see lagged RSI explicitly; it’s often the quickest win.
D — Check alignment & target definition
Make sure X[t] contains RSI up to time t and y[t] is the return you want to predict (e.g., next-hour return). A common bug: off-by-one shifting makes the model never see the predictive RSI at the right moment.
quick check
print("Example RSI window (last timesteps):", X_val[0, -1, rsi_index])
print("Target comparable return:", y_val[0])
E — Small diagnostic LSTM (keep it simple)
Train a very small LSTM only on RSI (and lagged RSI) to see if the architecture can learn the pattern. This isolates other features.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
from tensorflow.keras.optimizers import Adam
X_rsi shape = (N, lookb, 1)
model = Sequential([LSTM(16, input_shape=(lookb,1)), Dense(1, activation='linear')])
model.compile(optimizer=Adam(1e-3), loss='mse')
model.fit(Xtr_rsi, ytr, validation_data=(Xv_rsi, yv), epochs=100, batch_size=32,
callbacks=[EarlyStopping(monitor='val_loss', patience=8, restore_best_weights=True)])
If this small model learns (val loss drops), your full-feature LSTM should too; if not, the problem is data/signal strength.
F — Training tips to surface weak signals
Use linear final activation (no sigmoid) for regression.
Standardize features and target (zero mean, unit std) using scaler fit on train only.
Disable shuffle in .fit() (LSTM requires sequence order): shuffle=False.
Lower LR if unstable: try 1e-3 → 3e-4 → 1e-4.
Gradient clipping: Adam(learning_rate=1e-3, clipnorm=1.0).
Batch size: try 32; tiny batches produce noisy gradients that obscure weak patterns.
EarlyStopping + ReduceLROnPlateau as before.
G — Measure direct sensitivity of model predictions to RSI lags
After training, test sensitivity: for a validation sample, set RSI lag k to zero and see prediction change; or compute input gradients (with tf.GradientTape) to see which timesteps/features the model uses.
Permutation importance example:
base_loss = model.evaluate(Xv, yv, verbose=0)
Xpv = Xv.copy()
permute lag j across samples
Xpv[:,:,rsi_index] = np.random.permutation(Xpv[:,:,rsi_index])
loss_perm = model.evaluate(Xpv, yv, verbose=0)
print("loss increase:", loss_perm - base_loss)
If permuting RSI increases loss, the model uses RSI.
H — If signal is tiny, amplify it for training experiments
Temporarily multiply your synthetic RSI-driven shift by 5–10×. If model then learns, you know it was a SNR issue and can tune to detect smaller effects (longer training, better architecture, ensembling).
Two quick sanity checks you can run now
Linear baseline performance (Ridge / OLS) on lagged RSI must show some predictive power (non-zero corr). If Ridge fails, LSTM isn’t expected to do well either.
Train small LSTM on RSI-only (no extra features) with lookb >= 20. If it learns, then the full model needs different tuning (maybe regularization was hiding the effect).
Summary: a concise plan of action
Confirm lookb >= 20.
Build lagged-RSI features and try a Ridge baseline (fast).
Train a small LSTM on RSI-only (linear final activation, lr=1e-3, batch=32, shuffle=False).
If baseline learns but your big model doesn’t: check regularization (reduce dropout/L2) and scaling; try removing aggressive reg until signal appears.
If baseline fails, increase synthetic signal magnitude to debug, or revisit target alignment.