Skip to content

embedding

Functions:

NameDescription
lagged_embedLagged embedding of a time series x.
scanGrid search over (E, tau) with cross-validation.
selectSelect best (E, tau) from scan results.
lagged_embed(x: np.ndarray, tau: int, e: int)

Lagged embedding of a time series x.

Parameters:

NameTypeDescriptionDefault
xndarray1D time series of shape (N,).required
tauintTime delay.required
eintEmbedding dimension.required

Returns:

TypeDescription
ndarrayEmbedded array of shape (N - (e - 1) * tau, e).

Raises:

TypeDescription
ValueError- If x is not a 1D array. - If tau or e is not positive. - If e * tau >= len(x).
Notes
  • While open to interpretation, it’s generally more intuitive to consider the embedding as starting from the (e - 1) * tauth element of the original time series and ending at the len(x) - 1th element (the last value), rather than starting from the 0th element and ending at len(x) - 1 - (e - 1) * tau.
  • This distinction reflects whether we think of “attaching past values to the present” or “attaching future values to the present”. The information content of the result is the same either way.
  • The use of reversed in the implementation emphasizes this perspective.

Examples:

import numpy as np
from edm.embedding import lagged_embed
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
tau = 2
e = 3
E = lagged_embed(x, tau, e)
print(E)
print(E.shape)
# [[4 2 0]
# [5 3 1]
# [6 4 2]
# [7 5 3]
# [8 6 4]
# [9 7 5]]
# (6, 3)
scan(x: np.ndarray, Y: np.ndarray | None = None, *, E: list[int], tau: list[int], n_ahead: int = 1, split: SplitFunc | None = None, predict: PredictFunc | None = None, metric: MetricFunc | None = None) -> np.ndarray

Grid search over (E, tau) with cross-validation.

Parameters:

NameTypeDescriptionDefault
x(ndarray, shape(N))Time series to embed.required
Y(ndarray or None, shape(N) or (N, M))Prediction target. If None, self-prediction (Y = x).None
Elist[int]Embedding dimension candidates.required
taulist[int]Time delay candidates.required
n_aheadintPrediction horizon (steps ahead).1
splitSplitFunc or NoneCallable (n: int) -> list[Fold]. Defaults to sliding_folds.None
predictPredictFunc or NonePrediction function. Defaults to simplex_projection.None
metricMetricFunc or NoneEvaluation metric. Defaults to mean_rho.None

Returns:

NameTypeDescription
scores(ndarray, shape(len(E), len(tau), K_max))Per-fold CV metric for each (E, tau) combination. K_max is the maximum number of folds across all E values. Entries where the fold does not exist are NaN.
select(scores: np.ndarray, *, E: list[int], tau: list[int]) -> tuple[int, int, float]

Select best (E, tau) from scan results.

Ranks each (E, tau) by mean - SE where SE is the standard error of the mean across folds. This penalises combinations whose scores vary widely across folds (unstable predictions) and those with fewer valid folds (less certainty), favouring parameters we are confident perform well.

Parameters:

NameTypeDescriptionDefault
scores(ndarray, shape(len(E), len(tau), K_max))Output of scan.required
Elist[int]Embedding dimension candidates (same as passed to scan).required
taulist[int]Time delay candidates (same as passed to scan).required

Returns:

TypeDescription
(best_E, best_tau, best_score)best_score is the mean over folds (not the adjusted value) so that it remains directly interpretable.