“Do not forget what you have learned of our past, Rodimus. From its lessons the future is forged.”. Optimus Prime
This story started when in 2018 Google published a paper called “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, presenting a new way to solve several NLP problems with only one tool. BERT is a novel method to pre-train language representations that can be fine tuned to change the functionality performing small changes to the architecture. Hence the word Transformers has been used to name this types of ML entities.
At first, BERT was conceived mainly to solve “Question Answering”, though later it was be adapted to perform text classification, sentiment analysis, name entity recognition among others. The key aspect of this solution is the use of Embeddings: pre-trained language representations as the starting point to solve other problems. At the same time, several variations have emerged such as: ULMFit, GPT and ELMo which are all unidirectional or shallowly bidirectional. …
If you are looking for a map to discover new datasets you are in the right place.
“My philosophy is that worrying means you suffer twice.” Newt Scamander
It is fun to imagine datasets as mythical creatures with their own personality and traits. Don’t let their surly appearance fool you. Datasets are gentle creatures willing to reveal their secrets when you spend enough time taming them.
If by chance you encounter a wild specimen, do not despair, you can plan ahead and you’ll subdue the beast. In your first encounter, analyze its appearance, presentation, size and if it lives in a database or in several folders. When you feel more at ease, you can go through a second inspection deeper in the details, study the nature of the data and understand the situation that it describes (this step is crucial). Finally when you feel more confident, find the best approach to open up the wisdom inside that creature and create a code routine to process it. …
I have been navigating through the Machine learning space for 8 years, sometimes it can be an overwhelming task, you need to read a lot of papers, learn programming languages and implement several strategies in a very short time. In this post I share some of my favorite resources that will make your journey more enjoyable.
From my personal experience I can say that I could not have made it so far without the collaboration of all my fellow travelers. Fortunately, the ML community have developed very useful guides to learn new techniques in less time. …