Internships

Quantifying the semantic gap

Here you find the details for the internship named "Quantifying the semantic gap" in the company ML6.

Details
Name: Quantifying the semantic gap
Company: ML6
Description:

The adoption of NLP models in real-world use cases has also meant the rise of niche domain-specific language models. Do you need a language model for Polish legal text? Sure thing. Or how about a language model for Swedish medical texts? Look no further. However, in many real-world situations, this begs the question of when we decide to explore using a custom language model or when we decide that an “out-of-the-box” language model will suffice. In some fields like the medical field, this question is trivial as most of the important words (e.g, names of diseases, Latin anatomy language, etc.) are out-of-vocabulary for general-purpose language models but in other domains such as legal texts, technical manuals, etc. it is much less obvious.

The goal of this project is to estimate the expected effect that using a custom language model will have over an existing one based on known heuristics. The current best approach is to compare the overlap of n-grams from your domain-specific texts to those from the texts that the existing solution was trained on and install some (arbitrary) cut-off (i.e, if the n-gram overlap is under 30%, we explore using a custom language model). However, this method is not very quantitative nor very rigorous.

Target profiles:
    In industries:
      Required special knowledge:

      - Strong analytical abilities, knowledge of different statistical methods, not scared by mathematics and a familiarity with research studies.
      - Strong interest in Computer Vision / NLP / Other subdomain [preferred]
      - Familiarity with statistical analysis languages and tools like Python, SQL.
      - Excellent verbal and written communication in English.
      - You are currently pursuing a degree in computer science or related field.

      Duration: min 6 weeks
      Paid: Nee
      Net wage: -
      Foreign: Nee