Skip to content

How does one add DTW to metrics? #363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
a-berg opened this issue Aug 31, 2021 · 1 comment
Open

How does one add DTW to metrics? #363

a-berg opened this issue Aug 31, 2021 · 1 comment

Comments

@a-berg
Copy link

a-berg commented Aug 31, 2021

First of all let me thank you for this package, and say: great work!

Is your feature request related to a problem? Please describe.
I am working with timeseries data. For now I have applied K-means with L2 norm with okay-ish results, however I wanted to try other distance functions to see if these results could be improved. This distance is already implemented in the package dtw-python. After some hours I haven't been able to make FDA K-means work with dtw, I have been trying to wrap the function but I am at a loss without knowing more about the inner workings of your library.

Describe the solution you'd like
Some documentation on how to adapt pre-existing distance functions to your API. I am willing to contribute code if some guidance is provided.

Describe alternatives you've considered
I have considered switching to scikit-learn. After all, DTW is meant for timeseries data so the functional approach that skfda provides isn't as critical as when applying Euclidean distance. I however think that in order to automate the process and do hyperparameter optimization, it's better if I only use skfda.

Additional context

@vnmabus
Copy link
Member

vnmabus commented Aug 31, 2021

First, I will explain things as they are in the current development branch, as we will release it as a new version in a few days. Remember that with pip you can install a package from a Github branch, even if it has not yet been released in PyPI.

A metric is a Python callable with the following definition (in your case MetricElementType should be FDataGrid, as DTW cannot be implemented for FDataBasis as far as I know):

class Metric(Protocol[MetricElementType]):
    """Protocol for a metric between two elements of a metric space."""

    @abstractmethod
    def __call__(
        self,
        __e1: MetricElementType,
        __e2: MetricElementType,
    ) -> np.ndarray:
        pass

It receives two FDataGrid objects with the same length, and returns a 1d array where the i-th element has the distance between the i-th elements of both FDataGrid.

You can either create a function with the prototype of __call__, or create a class (and maybe inherit from Metric[FDataGrid] so that Mypy helps you with the typing). The class is better if you want to pass parameters to the metric (such as p for a Lp metric).

In order to implement the function, you probably will need to access the grid_points attribute of the involved FDataGrid instances, as well as the actual function values in the data_matrix attribute. Those are explained in https://fda.readthedocs.io/en/latest/auto_tutorial/plot_getting_data.html#the-fdatagrid-class .

I am not familiar with dtw-python. I assume that it works only with univariate scalar-valued functions. In that case you probably will need to use gridpoints[0] and data_matrix[..., 0] in order to pick the grid points for the first (and only) dimension, and the values of the first (and only) coordinate of the function return.

That would be enough to use it. If you want to optimize also the pairwise computation of that metric, it is also possible to override the default pair-by-pair computation, as it is done for the L2 case here:

@pairwise_metric_optimization.register
def _pairwise_metric_optimization_lp_fdata(
metric: LpDistance,
elem1: FData,
elem2: Optional[FData],
) -> np.ndarray:
from ...misc import inner_product, inner_product_matrix
vector_norm = metric.vector_norm
if vector_norm is None:
vector_norm = metric.p
# Special case, the inner product is heavily optimized
if metric.p == vector_norm == 2:
diag1 = inner_product(elem1, elem1)
diag2 = diag1 if elem2 is None else inner_product(elem2, elem2)
if elem2 is None:
elem2 = elem1
inner_matrix = inner_product_matrix(elem1, elem2)
distance_matrix_sqr = (
-2 * inner_matrix
+ diag1[:, np.newaxis]
+ diag2[np.newaxis, :]
)
np.clip(
distance_matrix_sqr,
a_min=0,
a_max=None,
out=distance_matrix_sqr,
)
return np.sqrt(distance_matrix_sqr)
return NotImplemented

I also wanted to say that if it is easier for you, scikit-fda is designed to be as compatible as possible with scikit-learn, and we reuse its hyperparameter tuning utilities. You can even mix both in the same Pipeline, see here for examples:

https://fda.readthedocs.io/en/latest/auto_tutorial/plot_skfda_sklearn.html

Note that as dtw-python has a GPL license, I cannot use it in our project, which has an incompatible BSD3 license. Thus, I cannot accept a PR that uses that library. If you implement it on your own or you use a more permissive library I could accept that implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants