Is a News? 

This is a web application that aims to automatically discover news URLs based on their content and a predefined URL database, which could be useful for various downstream applications such as online misinformation detection and news domain identification. To classify URLs, there is an underlying machine learning model in this tool that exploits a lookup of news and non-news domains and a content-based classifier trained using a labelled dataset of >20000 URLs.

To access:

This is an open-source tool. If you have found this tool useful for your research, please let me know your application.

O*NET Knwoledge Database 

This dataset includes the occupational data, crawled from the O*NET website. It consists of specific information (i.e., summary, tasks, activities, and interest profiles) related to 1110 occupations.

Files Description:

Download:

If you use this dataset, please cite the following paper:

Amila Silva, Pei-Chi Lo and Ee-Peng Lim, JPLink: On Linking Jobs to Vocational Interest Types, In Proceeding of the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2020), Pg 220-232, 2020

Singapore Personal Value Dataset

This resource includes two anonymized datasets: collected using 125 Facebook users ('facebook_dataset.json') and 85308 Twitter users ('twitter_dataset.json') in Singapore. Both datasets are in json format, where each entry in the json list corresponds to a user of that particular social network.

Files Description:

Download:

If you use this dataset, please cite the following paper:

Amila Silva, Pei-Chi Lo and Ee-Peng Lim, On Predicting Personal Values of Social Media Users using Community-Specific Language Features and Personal Value Correlation, In Proceeding of the International AAAI Conference on Web and Social Media (ICWSM 2021)