Natural Language Processing Training

NLP Training

Due to the COVID-19 our training courses will be taught via an online classroom.

Receive in-depth knowledge from industry professionals, test your skills with hands-on assignments & demos, and get access to valuable resources and tools.

This course is a deep dive into Natural Language Processing (NLP). The lessons that are presented here focus on NLP applications such as sentiment analysis, feature extraction, and several models for NLP, including the latest state-of-the-art algorithms such as BERT. After this course, you will gain hands-on experience in semi-supervised and unsupervised machine learning methods for NLP as well as theoretical understanding of the concepts and models. As requirements, experience with python and machine learning are needed. 

Are you interested? Contact us and we will get in touch with you.


Get in touch for more information

Fill in the form and we will contact you about the NLP training:

Academy: NLP
I agree to be contacted *

About the training & classes

The NLP training is split in 3 days. Click below to see a detailed description of each class: 

Natural Language Processing: I

In the first NLP class of the series, we will teach you the foundations needed to analyze linguistic data and understand basic Natural Language Processing concepts.

The training starts with a discussion about the challenges of linguistic data, followed by techniques of handling, cleaning, and normalizing text data. The lesson concludes with two language models: *N-grams* and *word embeddings*. The latter model will be discussed further in NLP 2, as it requires a better understanding of RNNs and Deep Learning.

The theoretical lesson is followed by a few lab exercises where participants get familiarized with main NLP toolkits used in the industry (NLTK, Spacy, Gensim) and train a Bayesian model to predict the author's gender using word frequency features from Twitter data.

The training includes theory, demos, and hands-on-exercises.

By the end of the training participants will have gained knowledge about:

  • Techniques of handling, cleaning and normalizing linguistic data (Tokenization, Normalization, Stemming, Lemmatization, Stop words)
  • Modelling language to derive insights  (Statistical language modelling, Word frequency: Bag of words, Tf-idf, Term frequency-inverse document frequency, N-grams, Word embeddings and vector representation of words)  
  • Useful methods for topic classification and sentiment analysis will be discussed
  • The lab includes an introduction to the main NLP toolkits used in the industry (NLTK, Spacy, Gensim)
Natural Language Processing: II

The second class in the NLP series will provide you with an overview of the most widely-used NLP models currently and how to implement them as part of a Machine Learning pipeline with text data.

The training starts with a survey of a few relevant NLP models, from simple ones like *Bag of Words*, through RNN-based ones like *word2vec, fastText, and ELMo*, through the more recent transformers, such as *BERT*. The theoretical NLP model survey is followed by two labs where participants utilize various NLP models to retrain on new tasks: author profiling and sentiment analysis.

The training includes theory, demos, and hands-on-exercises.

By the end of the training participants will have gained gained knowledge about the characteristics of the following language models:

  • Bag of Words - advantages and disadvantages, the information it captures
  • word2vec - advantages and disadvantages, model and architecture, classification task it's trained on
  • fastText - advantages and disadvantages, model and architecture, types of tokens it's trained on
  • ELMO - advantages over fastText and word2vec, difference in architecture
  • Transformers
  • BERT - advantages over LSTM-based language models, tasks it's trained on, role of attention in the model
  • Implementing a language model as part of a machine learning pipeline for text data
  • The information required in the language model for the machine learning model: word frequency, general meaning (semantics), context-sensitive meaning (ambiguity, pragmatics), syntactic meaning (parts of speech, grammatical sentences), text summatization
  • Performing transfer learning: retraining the language model on new data and for a new task, in this training sentiment analysis and author profiling.
Natural Language Processing: III

In the last NLP class of the series we will teach you how to use semi-supervised and unsupervised Machine Learning methods by working on NLP tasks and text data as a case study.

The class starts with a discussion about the problem of scarce data in NLP, especially for languages other than English. We discuss solutions for this problem -- and indeed the problem of shortage of good data in general -- namely *pre-training, self-training* and *consistency regulation*.

The lesson ends with the introduction of *LDA (Latent Dirichlet Allocation)*, an unsupervised learning model useful in topic modeling. The theoretical lesson of semi-supervised and unsupervised learning is followed by a lab exercise where participants utilize an *LDA* model to extract features from the text that are then used to perform author profiling (the same task used throughout the NLP sequence).

The training includes theory, demos, and hands-on-exercises.

By the end of the training participants will have gained knowledge about the characteristics of the following semi-supervised and unsupervised methods:

  • Pretrainig - reviewing the pros and cons of using pretrained models (e.g. word2Vec, fastText, BERT) and autoencoders in supervised models
  • Self-learning - pros and cons of using this simple model when data is scarce
  • Consistency regulation - pros and cons of generating synthetic input noise to improve model robustness
  • LDA - application and features
  • Implementing an LDA model as pretraining for an NLP machine learning pipeline