How does one add DTW to metrics? #363

a-berg · 2021-08-31T16:10:34Z

First of all let me thank you for this package, and say: great work!

Is your feature request related to a problem? Please describe.
I am working with timeseries data. For now I have applied K-means with L2 norm with okay-ish results, however I wanted to try other distance functions to see if these results could be improved. This distance is already implemented in the package dtw-python. After some hours I haven't been able to make FDA K-means work with dtw, I have been trying to wrap the function but I am at a loss without knowing more about the inner workings of your library.

Describe the solution you'd like
Some documentation on how to adapt pre-existing distance functions to your API. I am willing to contribute code if some guidance is provided.

Describe alternatives you've considered
I have considered switching to scikit-learn. After all, DTW is meant for timeseries data so the functional approach that skfda provides isn't as critical as when applying Euclidean distance. I however think that in order to automate the process and do hyperparameter optimization, it's better if I only use skfda.

Additional context

The text was updated successfully, but these errors were encountered:

vnmabus · 2021-08-31T17:51:07Z

First, I will explain things as they are in the current development branch, as we will release it as a new version in a few days. Remember that with pip you can install a package from a Github branch, even if it has not yet been released in PyPI.

A metric is a Python callable with the following definition (in your case MetricElementType should be FDataGrid, as DTW cannot be implemented for FDataBasis as far as I know):

class Metric(Protocol[MetricElementType]):
    """Protocol for a metric between two elements of a metric space."""

    @abstractmethod
    def __call__(
        self,
        __e1: MetricElementType,
        __e2: MetricElementType,
    ) -> np.ndarray:
        pass

It receives two FDataGrid objects with the same length, and returns a 1d array where the i-th element has the distance between the i-th elements of both FDataGrid.

You can either create a function with the prototype of __call__, or create a class (and maybe inherit from Metric[FDataGrid] so that Mypy helps you with the typing). The class is better if you want to pass parameters to the metric (such as p for a Lp metric).

In order to implement the function, you probably will need to access the grid_points attribute of the involved FDataGrid instances, as well as the actual function values in the data_matrix attribute. Those are explained in https://fda.readthedocs.io/en/latest/auto_tutorial/plot_getting_data.html#the-fdatagrid-class .

I am not familiar with dtw-python. I assume that it works only with univariate scalar-valued functions. In that case you probably will need to use gridpoints[0] and data_matrix[..., 0] in order to pick the grid points for the first (and only) dimension, and the values of the first (and only) coordinate of the function return.

That would be enough to use it. If you want to optimize also the pairwise computation of that metric, it is also possible to override the default pair-by-pair computation, as it is done for the L2 case here:

scikit-fda/skfda/misc/metrics/_lp_distances.py

Lines 98 to 136 in cf71fad

    
           @pairwise_metric_optimization.register 
        
           def _pairwise_metric_optimization_lp_fdata( 
        
               metric: LpDistance, 
        
               elem1: FData, 
        
               elem2: Optional[FData], 
        
           ) -> np.ndarray: 
        
               from ...misc import inner_product, inner_product_matrix 
        
               vector_norm = metric.vector_norm 
        
               if vector_norm is None: 
        
                   vector_norm = metric.p 
        
               # Special case, the inner product is heavily optimized 
        
               if metric.p == vector_norm == 2: 
        
                   diag1 = inner_product(elem1, elem1) 
        
                   diag2 = diag1 if elem2 is None else inner_product(elem2, elem2) 
        
                   if elem2 is None: 
        
                       elem2 = elem1 
        
                   inner_matrix = inner_product_matrix(elem1, elem2) 
        
                   distance_matrix_sqr = ( 
        
                       -2 * inner_matrix 
        
                       + diag1[:, np.newaxis] 
        
                       + diag2[np.newaxis, :] 
        
                   ) 
        
                   np.clip( 
        
                       distance_matrix_sqr, 
        
                       a_min=0, 
        
                       a_max=None, 
        
                       out=distance_matrix_sqr, 
        
                   ) 
        
                   return np.sqrt(distance_matrix_sqr) 
        
               return NotImplemented

I also wanted to say that if it is easier for you, scikit-fda is designed to be as compatible as possible with scikit-learn, and we reuse its hyperparameter tuning utilities. You can even mix both in the same Pipeline, see here for examples:

https://fda.readthedocs.io/en/latest/auto_tutorial/plot_skfda_sklearn.html

Note that as dtw-python has a GPL license, I cannot use it in our project, which has an incompatible BSD3 license. Thus, I cannot accept a PR that uses that library. If you implement it on your own or you use a more permissive library I could accept that implementation.

a-berg added the enhancement label Aug 31, 2021

Clej mentioned this issue Jan 13, 2022

Metrics: Soft dynamic-time-warping divergence for FDataGrid #412

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does one add DTW to metrics? #363

How does one add DTW to metrics? #363

a-berg commented Aug 31, 2021 •

edited

Loading

vnmabus commented Aug 31, 2021

How does one add DTW to metrics? #363

How does one add DTW to metrics? #363

Comments

a-berg commented Aug 31, 2021 • edited Loading

vnmabus commented Aug 31, 2021

a-berg commented Aug 31, 2021 •

edited

Loading