Research similarity metrics and contribute to TF

Here you find the details for the internship named "Research similarity metrics and contribute to TF" in the company ML6.

Name: Research similarity metrics and contribute to TF
Company: ML6

“Currently, the state-of-the-art similarity metrics are only implemented in R. We want to port these to Python and implement these in frameworks like Tensorflow.”

Context of the internship
-The Wasserstein distance has been around for centuries but recently is causing a furore in ML. In essence, you calculate how different two distributions are, and the result is a number between 0 and +inf.
-Now, we can use the Wasserstein distance as a metric to calculate the degree of difference between two probabilistic functions, but we have to go with a parametric version of it on real life data to estimate the actual Wasserstein distance of the two underlying distributions.
-The question that pops up is: How do we define when 2 distro's are different using the Wasserstein distance? How do we go about hypothesis testing? 🤔
-We are not the first ones to think about this. Schefzik et al. have come up with a way to test this and implemented it in R.
So... We want to make this test available in python and add it to scipy and TensorFlow Data Validation.

Target profiles:
    In industries:
      Required special knowledge:

      Duration: min 6 weeks
      Paid: Nee
      Net wage: -
      Foreign: Nee
      Contact: Julie Plusquin (Talent Partner)