Multimodal Online Misinformation Detection, 2020-2023

The University of Melbourne

Identifying misinformation manually is not practical due to the high volume of online information that circulates on a daily basis. Most existing automated solutions suffer from limited accuracy in detecting fake content on social media platforms such as Facebook due to their inadequacy to understand the holistic picture of data records involving multiple forms (e.g., text, image, video, audio), over-reliance on manual fact-checking  (supervised learning) and inability to continuously accumulate knowledge for adjusting to temporal changes (e.g., emergencies and events). This project bridged these research gaps by proposing a self-supervised multimodal approach for misinformation detection with online learning capabilities.

Neural Networks for Environmental Data Compression, 2021

The University of Melbourne and Australia's Bureau of Meteorology

The Bureau of Meteorology, Australia generates multiple terabytes of environmental forecast information per day, which eventually results in large-scale datasets over time. To facilitate the cost-efficient transmission and storage of such datasets, this project investigated the applicability and efficacy of autoencoder neural network architectures for the task of data compression and decompression. This project delivered a convolutional autoencoder-based neural network that yield around 10% data compression ratio while keeping important information of the data representing worldwide sea-surface height values.

Computer vision-based Road Crack Analysis, 2021

The University of Melbourne and Department of Transport Victoria

This project developed a deep learning model based on transfer learning and semi-supervised learning to predict different crack types on roads using images. This project delievered a dashboard to visualize identified road cracks.

Traffic Forecasting on Large Road Networks, 2021

The University of Melbourne and Department of Transport Victoria

This project developed a deep learning model to forecast traffic on large road networks impacted by events. The proposed model jointly preserves spatial and temporal dependencies on road networks using a combination of GCN (Graph Convolutional Network) and LSTM (Long Short-term Memory) to predict traffic profiles for different events that are identified using an anomaly detection technique

Online Market Basket Analysis, 2020

The University of Melbourne

Market Basket Analysis (MBA) is a popular technique to identify associations between products, which is crucial for business decision making. Nevertheless, existing MBA techniques typically fail to uncover rarely occurring associations among the products at their most granular level. Also, they have limited ability to capture temporal dynamics in associations between products. To address these gaps, this project propose online deep representation learning-based technique for online market basket analysis. The proposed method effectively captures rarely occurring strong associations and effectively capturing temporal changes in associations.

Spatiotemporal Activity Modelling, 2020

The University of Melbourne

Building spatiotemporal activity models for people's activities in urban spaces is important for understanding the ever-increasing complexity of urban dynamics. This project devises ML model based on online deep representation learning for spatiotemporal activity modeling using a Twitter stream. The proposed method is shown to be effective in downstream applications such as location/activity recommendation and event detection.

Word-level Language Prediction in Multilingual Text, 2019

The University of Melbourne and IEEE BigData-2019

This project developed the benchmark system for BigData Cup Challenge---"Understanding Multilingual Communities through Analysis of Code-switching Behaviors in Social Media Discussions" at IEEE BigData 2019, which proposed a ML model to detect word-level language labels for words in multilinugal text. The proposed solution adopted langdetect -- i.e., widely used language detection library, to produce weak word-level language labels, which are fine-tuned using a CRF (Conditional Random Field) model. 

Ranked 1st in IEEE BigData Cup Challenge 5

Information Extraction from Social Science Publications, 2019

Living Analytics Research Centre, Singapore Management University

This project developed a machine learning model to automate the discovery of research datasets and the associated research methods and fields in social science research publications.  The proposed solution consists of a heuristics-based candidate phrase extraction module and a knowledge graph-based information machine module to extract dataset and research method mentions in scientific publications.

One of the top 5 teams in Coleridge Rich Context Competition, 2018

Profiling Job Posts, 2018

Living Analytics Research Centre, Singapore Management University

This project developed a novel machine learning model to predict vocational interest profiles of job posts using their text content. The proposed model adopts a domain-specific word representation learning technique to understand the word semantics of job posts from the perspective of job domain, and a learning to rank (LTR) model to predicts jobs' interest profiles.

This model is deployed in the application here to match users with relevant jobs.

Predicting Individuals' Personal Values from Word Usage, 2018

Living Analytics Research Centre, Singapore Management University

This project devised a machine learning model to predict the personal values of individuals, which shows around 45% performance boost over the state-of-the-art methods. The proposed model explicitly exploits the community-specific word usage of users and the significant correlation among the personal value dimensions. The proposed model also explores the possibility to represent silent users in social media using other profile information.

Malware Signature Extraction from Cybersecurity Reports, 2018

StatNLP Research Team, Singapore University of Technology and Design

This project developed the benchmark system for SemEval-2018 Task 8: Semantic Extraction from CybersecUrity REports using Natural Language Processing (SecureNLP), which consists of four sub tasks, to incrementally predict the characteristics of a specific malware using cybersecurity reports.