Software Engineering Analytics research at University of Wollongong (SEA@UOW)

Software Engineering Analytics research at University of Wollongong (SEA@UOW)

The pervasiveness of software products in all areas of society has resulted in millions of software projects (e.g. over 17 million active projects on GitHub) and a massive amount of data about their development, operation and maintenance (e.g. the well-known Web browser, Mozilla Firefox project, currently has over 300 releases and 1.5 million issues reports since its initial release in 2002). This huge amount of software engineering data is continuously generated at a rapid rate in many forms such as user stories, use cases, requirements specifications, issue and bug reports, source code, test cases, execution logs, app reviews, user and develop mailing lists, discussion threads, and so on. Hidden in those Big Data are insights valuable to project managers, software engineers and other stakeholders about the quality of the development process and the software product, and the experience that software users receive.

Using cutting-edge machine learning and data mining techniques, our Software Engineering Analytics (SEA) research team aims to develop analytics technologies which specifically turn software engineering data into actionable insight. We believe that SEA will significantly improve the theory and practice of software engineering, enabling us to build better software and build software better, addressing both quality and productivity needs.

The SEA research program, led by Dr Hoa Khanh Dam, is part of UOW's Decision System Lab.

News

Dr Hoa Dam has been invited to visit and give a talk on SEA@UOW research at Faculty of Information Technology, Monash University in April, 2018 - check this link for details

We will have 2 contributions at ICSE2018: one paper at ICSE NIER and the other at ICSE poster track.

Dr Hoa Dam has been invited to serve as Program Co-Chair for the 25th Australasian Software Engineering Conference (ASWEC 2018)

Paper on deep learning for story point estimation in agile development has been accepted to IEEE Transactions on Software Engineering.

Welcome Aziz (PhD student) and Jack (Honours student) to the SEA@UOW team!

One PhD scholarship is available for commencing in 2018.

TSE paper invited for a Journal-First paper presentation at ESEC/FSE 2017

UOW News: "Critical error: when machines go off-script"

People

Staff:

Students:

Morakot Choetkiertikul (PhD student)
Alexis Harper (PhD student)
Wisam Al-Zubaidi (PhD student)
Shien Wee NG (PhD student)
Abdulaziz Alhefdhi (PhD student)
Jack Humphreys (Honours student)
Huy Quan Ha (Summer intern)

Collaborators:

Truyen Tran (Deakin University)
Trang Pham (Deakin University)
John Grundy (Monash University)
Tim Menzies (North Carolina State University)
Taeksu Kim (SAMSUNG)

Projects

Below are some exemplar projects that we have done:

Deep learning for software engineering - see our DeepSoft and a project with Samsung.

Helping project managers in predicting delays: Late delivery and cost overruns have been a common problem in (software) projects for many years. Contributing to this problem is the lack of support for predicting, at any given stage in a project, which project tasks (among hundreds to thousands tasks) are at risk of being delayed. Foreseeing such risks would allow project managers and software engineers to take prudent measures to assess and manage the risks, and consequently reduce the chance of their project being delayed. This work aims to provide automated support to enable such a prediction. We mined over 100,000 JIRA issues/tasks from large software projects, and built a number of highly accurate predictive models that can predict whether a task or issue will get delayed and if so, the degree of delayness.

Morakot Choetkiertikul, Hoa Khanh Dam, Truyen Tran and Aditya Ghose, Predicting delay of issues with due dates in software projects, Empirical Software Engineering journal, Volume 22, Issue 3, pages 1223-1263, Springer, dx.doi.org/10.1007/s10664-016-9496-7
Morakot Choetkiertikul, Hoa Khanh Dam, Truyen Tran and Aditya Ghose, Predicting delays in software projects using networked classification, Proceedings of 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 353 - 364, IEEE (acceptance rate 20.8). (PDF)
Morakot Choetkiertikul, Hoa Khanh Dam, Truyen Tran and Aditya Ghose, Characterization and prediction of issue-related risks in software projects, Proceedings of 12th International Conference on Mining Software Repositories (MSR), co-located with ICSE 2015, pages 280 - 291, IEEE (acceptance rate 30%). (ACM SIGSOFT Distinguished Paper Award) (PDF)

Helping agile teams in effort estimation: It has now become a common practice for agile teams to go through each user story and estimate the effort for completing it. Story points are commonly used as a unit of measure for specifying the overall effort of a user story. Currently, most agile teams heavily rely on experts’ subjective assessment (e.g. planning poker, analogy, and expert judgment) to arrive at an estimate. This may lead to inaccuracy and more importantly inconsistencies between estimates. A machine learner can help the team maintain this consistency, especially in coping with increasingly large numbers of user stories. It does so by learning insight from past issues and estimations to make future estimations. We mined over 23,000 user stories recorded in JIRA from 16 large software projects, and built a deep learning model (using a novel combination of the Long Short Term Memory architecture and the Recurrent Highway Network) which recommends the effort of implementing a user story (in story points).

Morakot Choetkiertikul, Hoa Khanh Dam, Truyen Tran, Trang Pham, Aditya Ghose, Tim Menzies, A deep learning model for estimating story points, under review at IEEE Transactions on Software Engineering (TSE) ((PDF)

Helping decision makers in delivery prediction: Iterative software development has become widely practiced in industry. Since modern software projects require fast, incremental delivery for every iteration of software development, it is essential to monitor the execution of an iteration, and foresee a capability to deliver quality products as the iteration progresses. In this work, we developed a novel, data-driven approach to providing
automated support for project managers and other decision makers in predicting delivery capability for an ongoing iteration. Our model was evaluated using 3,834 iterations/sprints and 56,687 issues recorded in JIRA we collected from 5 large software projects.

Morakot Choetkiertikul, Hoa Khanh Dam, Truyen Tran, Aditya Ghose, John Grundy, Predicting delivery capability in iterative software development, IEEE Transactions on Software Engineering (TSE), https://doi.org/10.1109/TSE.2017.2693989.

Mining social norms from open software projects: Social norms facilitate coordination and cooperation among individuals, thus enable smoother functioning of social groups such as the highly distributed and diverse open source software development (OSSD) communities. In these communities, norms are mostly implicit and hidden in huge records of human-interaction information such as emails, discussions threads, bug reports, commit messages and even source code. This new line of research aims to extract social norms from the rich data available in software repositorie.

Hoa Khanh Dam, Bastin Tony Roy Savarimuthu, Daniel Avery and Aditya Ghose, Mining Software Repositories for Social Norms, Proceedings of the 37th International Conference on Software Engineering (ICSE), New Ideas and Emerging Results Track, pages 627 – 630, IEEE (acceptance rate 18.5%). (PDF)
Daniel Avery, Hoa Khanh Dam, Bastin Tony Roy Savarimuthu and Aditya Ghose, Externalization of Software Behavior by the Mining of Norms, Proceedings of the 13th International Conference on Mining Software Repositories (MSR), co-located with ICSE 2016, pages 223-234, ACM (acceptance rate: 27%).
Morakot Choetkiertikul, Daniel Avery, Hoa Khanh Dam, Truyen Tran and Aditya Ghose, Who will answer my question on Stack Overflow?, Proceedings of the 24th Australasian Software Engineering Conference (ASWEC), pages 155 - 164, IEEE

Using the same analytics data-driven approach, we can also build predictive models/recommendation systems for the following (but not limited to):

Helping developers locate which part of their codebase should be examined to resolve an issue (issue/bug localization).
Recommending which issues should be resolved for the next release (i.e. the next release problem).
Determining if a new issue is a duplicate of an existing issue. Also, determining if a new issue is related to (e.g. dependent to) of existing issues.
Recommending who is the best person in a team to resolve an issue.
Predicting the time of resolving an issue.
Building stastical language models for source code to support code suggestion, code migration, etc.
Delay predictions in software projects

Please feel free to contact us, hoa (at) uow (dot) edu (dot) au, if you are interested in doing an industry project and/or a research project (for Honours, Master and PhD) in this area.

Datasets

All the datasets used in our publications are made publicly available here. If you use our datasets, please cite our relevant paper in your publication.