Natural Language Processing (NLP), the deciphering of text and data by machines, has revolutionised data analytics across all industries. It is the artificial intelligence-driven process of making human input language decipherable to software. In this course, we will showcase NLP use cases that are relevant to the HPC and STEM communities. One of the typical concerns that impacts all researchers is the analysis of literature where, for example, using automatic text mining techniques would save a huge amount of time in terms of literature reviewing and text summarising. This is traditionally done by researchers at the early stage of a research project and is time consuming and labour intensive. NLP will help not only to speed up this process but also provide a much more comprehensive overview by extracting all the relevant papers from a global scholarly connected database. In this course, we will teach how to clean the text data, analyse the text and provide a quick overview of the whole literature database.
Having basic programming experience with Python is highly recommended. Knowledge about using text processing python packages like NLTK is advantageous.
Attendees will ideally know some basic theory of Machine Learning and Deep Learning, and have intentions of using AI/ML and supercomputers to boost their research.
We will use the NCI ARE service and the Gadi Supercomputer. Attendees are encouraged to review the ARE User Guide for background information.
This course series is designed to help researchers to apply NLP in text mining and take advantage of the supercomputer (Gadi) to boost their research. Therefore, it aims to help attendees:
Understanding the basics of NLP
Understanding the pre-processing steps of text data
Understanding how to apply basic NLP techniques
Understanding how to run NLP applications on Gadi
Know how to use a python deep learning platform: Tensorflow
Know how to setup a python environment in Gadi
Know popular topic modeling methods: LDA, k-means clustering, t - SNE
Know popular text mining methods: Summarization, Topic Modeling, Text Classification and Keyword Extraction.
Know popular DL tools: Transformer
Know how to distribute model training in Gadi