Hoa Khanh Dam homepage - Đàm Khánh Hòa

DeepSoft: Deep Learning for Software Engineering

Our Vision:

Although software analytics has experienced rapid growth as a research area, it has not yet reached its full potential for wide industrial adoption. Most of the existing work in software analytics still relies heavily on costly manual feature engineering processes, and they mainly address the traditional classification problems, as opposed to predicting future events. We present a vision for DeepSoft, an end-to-end generic framework for modeling software and its development process to predict future risks and recommend interventions. DeepSoft, partly inspired by human memory, is built upon the powerful deep learning-based Long Short Term Memory architecture that is capable of learning long-term temporal dependencies that occur in software evolution. Such deep learned patterns of software can be used to address a range of challenging problems such as code and task recommendation and prediction. DeepSoft provides a new approach for research into modeling of source code, risk prediction and mitigation, developer modeling, and automatically generating code patches from bug reports.

Read more on DeepSoft in this paper:

Hoa Khanh Dam, Truyen Tran, John Grundy and Aditya Ghose, DeepSoft: A vision for a deep model of software, Proceedings of the 24th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE), Visions and Reflections Track, ACM Press, To Appear. (PDF)

Work In Progress:

A deep learning model for estimating story points:

Although there has been substantial research in software analytics for effort estimation in traditional software projects, little work has been done for estimation in agile projects, especially estimating user stories or issues. Story points are the most common unit of measure used for estimating the effort involved in implementing a user story or resolving an issue. In this paper, we offer for the first time a comprehensive dataset for story points-based estimation that contains 23,313 issues from 16 open source projects. We also propose a prediction model for estimating story points based on a novel combination of two powerful deep learning architectures: long short-term memory and recurrent highway network. Our prediction system is end-to-end trainable from raw input data to prediction outcomes without any manual feature engineering. An empirical evaluation demonstrates that our approach consistently outperforms three common effort estimation baselines and two alternatives in both Mean Absolute Error and the Standardized Accuracy.

Morakot Choetkiertikul, Hoa Khanh Dam, Truyen Tran, Trang Pham, Aditya Ghose, Tim Menzies, A deep learning model for estimating story points, submitted to ICSE (PDF)

A deep language model for software code:

Existing language models such as n-grams for software code often fail to capture a long context where dependent code elements scatter far apart. In this paper, we propose a novel approach to build a language model for software code to address this particular issue. Our language model, partly inspired by human memory, is built upon the powerful deep learning-based Long Short Term Memory architecture that is capable of learning long-term dependencies which occur frequently in software code. Results from our intrinsic evaluation on a corpus of Java projects have demonstrated the effectiveness of our language model. This work contributes to realizing our vision for DeepSoft, an end-to-end, generic deep learning-based framework for modeling software and its development process.

Hoa Khanh Dam, Truyen Tran, Trang Pham, A deep language model for software code, Workshop on Naturalness of Software, co-located with the 24th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE). (PDF)

Predicting hazardous software components using deep learning:

This project aims to develop novel approaches based on machine learning (particularly deep learning) to predict which components of software systems contain safety-critical hazards. This project is funded by Samsung, as part of its prestigous Global Research Outreach Program (http://media.uow.edu.au/news/UOW223834)

Hoa Khanh Dam, Truyen Tran, Trang Pham, Shien Wee Ng, John Grundy, Aditya Ghose, Automatic feature learning for vulnerability prediction, arXiv preprint arXiv:1708.02368

Back to homepage