embedding

`embedding`

Functions:

Name	Description
`lagged_embed`	Lagged embedding of a time series `x`.
`scan`	Grid search over (E, tau) with cross-validation.
`select`	Select best (E, tau) from scan results.

`lagged_embed`

lagged_embed(x: np.ndarray, tau: int, e: int)

Lagged embedding of a time series x.

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	1D time series of shape `(N,)`.	required
`tau`	`int`	Time delay.	required
`e`	`int`	Embedding dimension.	required

Returns:

Type	Description
`ndarray`	Embedded array of shape `(N - (e - 1) * tau, e)`.

Raises:

Type	Description
`ValueError`	- If `x` is not a 1D array. - If `tau` or `e` is not positive. - If `e * tau >= len(x)`.

Notes

While open to interpretation, it’s generally more intuitive to consider the embedding as starting from the (e - 1) * tauth element of the original time series and ending at the len(x) - 1th element (the last value), rather than starting from the 0th element and ending at len(x) - 1 - (e - 1) * tau.
This distinction reflects whether we think of “attaching past values to the present” or “attaching future values to the present”. The information content of the result is the same either way.
The use of reversed in the implementation emphasizes this perspective.

Examples:

import numpy as np
from edm.embedding import lagged_embed

x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
tau = 2
e = 3

E = lagged_embed(x, tau, e)
print(E)
print(E.shape)
# [[4 2 0]
#  [5 3 1]
#  [6 4 2]
#  [7 5 3]
#  [8 6 4]
#  [9 7 5]]
# (6, 3)

`scan`

scan(x: np.ndarray, Y: np.ndarray | None = None, *, E: list[int], tau: list[int], n_ahead: int = 1, split: SplitFunc | None = None, predict: PredictFunc | None = None, metric: MetricFunc | None = None) -> np.ndarray

Grid search over (E, tau) with cross-validation.

Parameters:

Name	Type	Description	Default
`x`	`(ndarray, shape(N))`	Time series to embed.	required
`Y`	`(ndarray or None, shape(N) or (N, M))`	Prediction target. If None, self-prediction (Y = x).	`None`
`E`	`list[int]`	Embedding dimension candidates.	required
`tau`	`list[int]`	Time delay candidates.	required
`n_ahead`	`int`	Prediction horizon (steps ahead).	`1`
`split`	`SplitFunc or None`	Callable `(n: int) -> list[Fold]`. Defaults to sliding_folds.	`None`
`predict`	`PredictFunc or None`	Prediction function. Defaults to `simplex_projection`.	`None`
`metric`	`MetricFunc or None`	Evaluation metric. Defaults to `mean_rho`.	`None`

Returns:

Name	Type	Description
`scores`	`(ndarray, shape(len(E), len(tau), K_max))`	Per-fold CV metric for each (E, tau) combination. K_max is the maximum number of folds across all E values. Entries where the fold does not exist are NaN.

`select`

select(scores: np.ndarray, *, E: list[int], tau: list[int]) -> tuple[int, int, float]

Select best (E, tau) from scan results.

Ranks each (E, tau) by mean - SE where SE is the standard error of the mean across folds. This penalises combinations whose scores vary widely across folds (unstable predictions) and those with fewer valid folds (less certainty), favouring parameters we are confident perform well.

Parameters:

Name	Type	Description	Default
`scores`	`(ndarray, shape(len(E), len(tau), K_max))`	Output of `scan`.	required
`E`	`list[int]`	Embedding dimension candidates (same as passed to `scan`).	required
`tau`	`list[int]`	Time delay candidates (same as passed to `scan`).	required

Returns:

Type	Description
`(best_E, best_tau, best_score)`	`best_score` is the mean over folds (not the adjusted value) so that it remains directly interpretable.