David Przybilla


Github / Twitter


NLP Software Engineer at Idio

London, UK

Joined Idio in 2013 at an early startup stage, currently working as part of the R&D Team.

The role included both setting up NLP pipelines but also taking care of the infrastructure behind it (AWS, Chef)

projects included:

  • Bootstraping/Managing Ontologies
  • Using Client corpora to create Name Entity Linking Systems
  • Domain adaptation of Name Entity Systems
  • Using Entities as features for Recommendation Systems
  • Using tools as Spark to crunch big datasets
  • Creation of Multilang Word Embeddings

Technologies used

Scala, Python, Spark, Python, SBT, Neo4j, Wikidata, Freebase, DBPedia, AWS

Google Summer of Code Mentor


Participated as a mentor guiding students on making contributions to DBpedia Spotlight. The projects aimed at better entity linking by improving Disambiguation, it also aimed at using apache spark to improve language model creations.

Technologies used

Scala, Python, Java, Apache Spark

Software Engineer at Meridean


Joined Meridean remotely after contributing to open source projects of their interest. Meridian uses NLP technologies to analyse brand perceptions and health.

My tasks focused on:

  • Improving the knowledge resources for Spanish NLP
  • Setting up the pipeline for classifying and clustering text by industries
  • Improve their sentiment analysis pipeline

Technologies used

Scala, Python, Java, Apache Spark

Research Assistant at Saarland University

Saarland, Germany

Part of the Smile Project. My tasks were various but included: Refactoring code, Annotating evaluation data. The most relevant task including implementing new semantic representations to find paraphrases in text

Technologies used

Python, Java, Scala, C++

Young Researcher at Universidad del Valle Colombia

Cali, Colombia

After graduating joined a project funded by the colombian government called “Young researcher”. Worked in a project focused on graphs, and optimisation problems. Part of my tasks were helping professors giving lectures and tutorials about programming and algorithms.

Technologies used

Java, Latex, C++


MsC Language Science and Technology

University of Saarland
Saarbruecken, Germany

Bachelor In Computer Science

Universidad del Valle, Colombia
Cali, Colombia



Boostrapped the first Colombian Community around open data.
My activies concern:

  • Democratic access to public Datasets
  • Creation of Scrapers
  • Community Management (Engaging with Journalists and Software Engineers)
  • Analysis of datasets (contracts data, unestructured data i.e: text)

More information on my [medium blog](https://medium.com/@dav009/)

Wiki2vec (2015) - https://github.com/idio/wiki2vec

As part of an internal hackathon at Idio, I ensemble a pipeline for creating word mbeddings for Wikipedia entities.

Spotlight Model Editor (2014) - https://github.com/idio/spotlight-model-editor

As part of an internal hackathon at Idio. This project allows to modify Entity Models as a first milestone for creating domain specific entity linkers