A Pipeline for Analysing Grant Applications

by   Shuaiqun Pan, et al.

Data mining techniques can transform massive amounts of unstructured data into quantitative data that quickly reveal insights, trends, and patterns behind the original data. In this paper, a data mining model is applied to analyse the 2019 grant applications submitted to an Australian Government research funding agency to investigate whether grant schemes successfully identifies innovative project proposals, as intended. The grant applications are peer-reviewed research proposals that include specific “innovation and creativity” (IC) scores assigned by reviewers. In addition to predicting the IC score for each research proposal, we are particularly interested in understanding the vocabulary of innovative proposals. In order to solve this problem, various data mining models and feature encoding algorithms are studied and explored. As a result, we propose a model with the best performance, a Random Forest (RF) classifier over documents encoded with features denoting the presence or absence of unigrams. In specific, the unigram terms are encoded by a modified Term Frequency - Inverse Document Frequency (TF-IDF) algorithm, which only implements the IDF part of TF-IDF. Besides the proposed model, this paper also presents a rigorous experimental pipeline for analysing grant applications, and the experimental results prove its feasibility.


Computational Intelligence in Sports: A Systematic Literature Review

Recently, data mining studies are being successfully conducted to estima...

Adapting CRISP-DM for Idea Mining: A Data Mining Process for Generating Ideas Using a Textual Dataset

Data mining project managers can benefit from using standard data mining...

On Extracting Data from Tables that are Encoded using HTML

Tables are a common means to display data in human-friendly formats. Man...

An Overview of Data Mining Applications in Oil and Gas Exploration: Structural Geology and Reservoir Property-Issues

Low oil prices have motivated energy executives to look into cost reduct...

Subject Specific Stream Classification Preprocessing Algorithm for Twitter Data Stream

Micro-blogging service Twitter is a lucrative source for data mining app...

Applications of Data Mining Techniques for Vehicular Ad hoc Networks

Due to the recent advances in vehicular ad hoc networks (VANETs), smart ...

Predicting Soil pH by Using Nearest Fields

In precision agriculture (PA), soil sampling and testing operation is pr...

Please sign up or login with your details

Forgot password? Click here to reset