Volume List  / Volume 7 (3)



DOI: 10.7708/ijtte.2017.7(3).06

7 / 3 / 354-367 Pages


Kasthurirangan Gopalakrishnan - Department of Electrical Engineering and Computer Science, Northwestern University, USA -

Siddhartha K. Khaitan - Department of Electrical and Computer Engineering, Iowa State University, USA -


Research grant databases offer a wealth of information to study research trends, research collaboration networks and patterns of funding over time. Natural Language Processing (NLP) and Text Mining (TM) in combination with Machine Learning (ML) are excellent data science tools to collect, analyze and to unearth interesting findings from huge text corpora such as these databases. At a time, when transportation agencies across the globe are facing budgetary constraints and are asked “to do more with less”, extracting information from such databases to build predictive models for aiding or providing guidance to researchers and agencies has become very important. At the same time, understanding past patterns of funding and interest in various subject areas is also useful for PhD researchers planning their research formulation and for academic researchers seeking funding in general. We present a comprehensive study of the Transportation Research Board’s (TRB’s) Research in Progress (RIP) “big data” that contains information on more than 14,000 current or recently completed projects funded in the past 25 years, mainly by U.S. Department of Transportation (DOT) and State DOTs. We perform longitudinal studies to discover various interesting patterns and anomalies in the data using text mining pipelines. Finally, we develop a predictive model to leverage text mined information for predicting the most appropriate funding agency to target for a researcher working across various research areas.

Download Article

Number of downloads: 973


The authors sincerely acknowledge the Transportation Research Board (TRB), especially Ms. Janet S. Daly, for their help in facilitating the download of their publicly available Research in Progress (RIP) database. The authors would like to gratefully acknowledge Dr. Ayush Singhal, University of Minnesota at Minneapolis, for his sincere contributions and many fruitful discussions in the course of preparing this article.


Daly, J. 2016. TRB Webinar: Learning About and Using the Research in Progress (RiP) Database. Available from internet: http://www.trb.org/ElectronicSessions/Blurbs/174599.aspx.


Grobelnik, M.; Mladenić, D. 2003. Analysis of a database of research projects using text mining and link analysis. In Data Mining and Decision Support, Springer. New York, US. pp. 157-166.


Huang, Y.; Zhang, Y.; Youtie, J.; Porter, A.L.; Wang, X. 2016. How Does National Scientific Funding Support Emerging Interdisciplinary Research: A Comparison Study of Big Data Research in the US and China, PLoS ONE 11(5): e0154509.


Park, J.; Blume-Kohout, M.; Krestel, R.; Nalisnick, E.; Smyth, P. 2016. Analyzing NIH funding patterns over time with statistical text analysis. In Scholarly Big Data: AI Perspectives, Challenges, and Ideas, Workshop at AAAI. AAAI Press, Palo Alto, CA, 698-704.


Simmons, M.; Singhal, A.; Lu, Z. 2016. Text Mining for Precision Medicine: Bringing Structure to EHRs and Biomedical Literature to Understand Genes and Health. In Translational Biomedical Informatics. Springer, Singapore, 139-166.


Singhal, A.; Srivastava, J. 2016a. Data extract: Mining context from the web for dataset extraction, International Journal of Machine Learning and Computing 3(2): 219.


Singhal, A.; Simmons, M.; Lu, Z. 2016b. Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature, Journal of the American Medical Informatics Association 23(4): 766-72.


Singhal, A.; Simmons, M.; Lu, Z. 2016c. Text mining genotype-phenotype relationships from biomedical literature for database curation and precision medicine, PLOS Computational Biology 12(11): e1005017.


Talley, E.M.; Newman, D.; Mimno, D.; Herr II, B.W.; Wallach, H.M.; Burns, G.A.; Leenders, A.M.; McCallum, A. 2011. Database of NIH grants using machine-learned categories and graphical clustering, Nature Methods 8(6): 443-444.


Wu, J. 2015. Distributions of scientific funding across universities and research disciplines, Journal of Informetrics 9(1): 183-196.