embedding
embedding
Section titled “embedding”Functions:
| Name | Description |
|---|---|
lagged_embed | Lagged embedding of a time series x. |
scan | Grid search over (E, tau) with cross-validation. |
select | Select best (E, tau) from scan results. |
lagged_embed
Section titled “lagged_embed”lagged_embed(x: np.ndarray, tau: int, e: int)Lagged embedding of a time series x.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x | ndarray | 1D time series of shape (N,). | required |
tau | int | Time delay. | required |
e | int | Embedding dimension. | required |
Returns:
| Type | Description |
|---|---|
ndarray | Embedded array of shape (N - (e - 1) * tau, e). |
Raises:
| Type | Description |
|---|---|
ValueError | - If x is not a 1D array. - If tau or e is not positive. - If e * tau >= len(x). |
Notes
- While open to interpretation, it’s generally more intuitive to consider the embedding as starting from the
(e - 1) * tauth element of the original time series and ending at thelen(x) - 1th element (the last value), rather than starting from the 0th element and ending atlen(x) - 1 - (e - 1) * tau. - This distinction reflects whether we think of “attaching past values to the present” or “attaching future values to the present”. The information content of the result is the same either way.
- The use of
reversedin the implementation emphasizes this perspective.
Examples:
import numpy as npfrom edm.embedding import lagged_embed
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])tau = 2e = 3
E = lagged_embed(x, tau, e)print(E)print(E.shape)# [[4 2 0]# [5 3 1]# [6 4 2]# [7 5 3]# [8 6 4]# [9 7 5]]# (6, 3)scan(x: np.ndarray, Y: np.ndarray | None = None, *, E: list[int], tau: list[int], n_ahead: int = 1, split: SplitFunc | None = None, predict: PredictFunc | None = None, metric: MetricFunc | None = None) -> np.ndarrayGrid search over (E, tau) with cross-validation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x | (ndarray, shape(N)) | Time series to embed. | required |
Y | (ndarray or None, shape(N) or (N, M)) | Prediction target. If None, self-prediction (Y = x). | None |
E | list[int] | Embedding dimension candidates. | required |
tau | list[int] | Time delay candidates. | required |
n_ahead | int | Prediction horizon (steps ahead). | 1 |
split | SplitFunc or None | Callable (n: int) -> list[Fold]. Defaults to sliding_folds. | None |
predict | PredictFunc or None | Prediction function. Defaults to simplex_projection. | None |
metric | MetricFunc or None | Evaluation metric. Defaults to mean_rho. | None |
Returns:
| Name | Type | Description |
|---|---|---|
scores | (ndarray, shape(len(E), len(tau), K_max)) | Per-fold CV metric for each (E, tau) combination. K_max is the maximum number of folds across all E values. Entries where the fold does not exist are NaN. |
select
Section titled “select”select(scores: np.ndarray, *, E: list[int], tau: list[int]) -> tuple[int, int, float]Select best (E, tau) from scan results.
Ranks each (E, tau) by mean - SE where SE is the standard error
of the mean across folds. This penalises combinations whose scores
vary widely across folds (unstable predictions) and those with fewer
valid folds (less certainty), favouring parameters we are confident
perform well.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
scores | (ndarray, shape(len(E), len(tau), K_max)) | Output of scan. | required |
E | list[int] | Embedding dimension candidates (same as passed to scan). | required |
tau | list[int] | Time delay candidates (same as passed to scan). | required |
Returns:
| Type | Description |
|---|---|
(best_E, best_tau, best_score) | best_score is the mean over folds (not the adjusted value) so that it remains directly interpretable. |