Page tree

Natural Language Processing (NLP), the deciphering of text and data by machines, has revolutionised data analytics across all industries. It is the artificial intelligence-driven process of making human input language decipherable to software. In this course, we will showcase NLP use cases that are relevant to the HPC and STEM communities. One of the typical concerns that impacts all researchers is the analysis of literature where, for example, using automatic text mining techniques would save a huge amount of time in terms of literature reviewing and text summarising. This is traditionally done by researchers at the early stage of a research project and is time consuming and labour intensive. NLP will help not only to speed up this process but also provide a much more comprehensive overview by extracting all the relevant papers from a global scholarly connected database. In this course, we will teach how to clean the text data, analyse the text and provide a quick overview of the whole literature database.

 

Date/Time

Online Workshop

1 pm - 5 pm Canberra time

Registration is open now



Prerequisites

Having basic programming experience with Python is highly recommended. Knowledge about using text processing python packages like NLTK is advantageous.

Attendees will ideally know some basic theory of Machine Learning and Deep Learning, and have intentions of using AI/ML and supercomputers to boost their research.

We will use the NCI ARE service and the Gadi Supercomputer. Attendees are encouraged to review the ARE User Guide for background information.

Objectives

This course series is designed to help researchers to apply NLP in text mining and take advantage of the supercomputer (Gadi) to boost their research. Therefore, it aims to help attendees:

  • Understanding the basics of NLP

  • Understanding the pre-processing steps of text data 

  • Understanding how to apply basic NLP techniques

  • Understanding how to run NLP applications on Gadi

Learning Outcomes
  • Know how to use a python machine learning package: Scikit learn
  • Know how to use a python deep learning platform: Tensorflow

  • Know how to setup a python environment in Gadi

  • Know how to do text data processing: Lemmatization, Stemming, and Sentiment Analysis.
  • Know popular topic modeling methods: LDA, k-means clustering, t - SNE

  • Know popular text mining methods: Summarization, Topic Modeling, Text Classification and Keyword Extraction.

  • Know popular DL tools: Transformer

  • Know how to distribute model training in Gadi

  • No labels