Software Engineering Analytics research at University of Wollongong (SEA@UOW)

The pervasiveness of software products in all areas of society has resulted in millions of software projects (e.g. over 17 million active projects on GitHub) and a massive amount of data about their development, operation and maintenance (e.g. the well-known Web browser, Mozilla Firefox project, currently has over 300 releases and 1.5 million issues reports since its initial release in 2002). This huge amount of software engineering data is continuously generated at a rapid rate in many forms such as user stories, use cases, requirements specifications, issue and bug reports, source code, test cases, execution logs, app reviews, user and develop mailing lists, discussion threads, and so on.  Hidden in those Big Data are insights valuable to project managers, software engineers and other stakeholders about the quality of the development process and the software product, and the experience that software users receive.

Using cutting-edge machine learning and data mining techniques, our Software Engineering Analytics (SEA) research team aims to develop analytics technologies which specifically turn software engineering data into actionable insight.  We believe that SEA will significantly improve the theory and practice of software engineering, enabling us to build better software and build software better, addressing both quality and productivity needs.

The SEA research program, led by Dr Hoa Khanh Dam, is part of UOW's Decision System Lab.


  • Dr Hoa Dam has been invited to visit and give a talk on SEA@UOW research at Faculty of Information Technology, Monash University in April, 2018 - check this link for details
  • We will have 2 contributions at ICSE2018: one paper at ICSE NIER and the other at ICSE poster track.
  • Dr Hoa Dam has been invited to serve as Program Co-Chair for the 25th Australasian Software Engineering Conference (ASWEC 2018)
  • Paper on deep learning for story point estimation in agile development has been accepted to IEEE Transactions on Software Engineering.
  • Welcome Aziz (PhD student) and Jack (Honours student) to the SEA@UOW team!
  • One PhD scholarship is available for commencing in 2018.
  • TSE paper invited for a Journal-First paper presentation at ESEC/FSE 2017
  • UOW News: "Critical error: when machines go off-script"






  • Morakot Choetkiertikul (PhD student)
  • Alexis Harper (PhD student)
  • Wisam Al-Zubaidi (PhD student)
  • Shien Wee NG (PhD student)
  • Abdulaziz Alhefdhi (PhD student)
  • Jack Humphreys (Honours student)
  • Huy Quan Ha (Summer intern)



Below are some exemplar projects that we have done:

  1. Deep learning for software engineering - see our DeepSoft and a project with Samsung.
  1. Helping project managers in predicting delays: Late delivery and cost overruns have been a common problem in (software) projects for many years. Contributing to this problem is the lack of support for predicting, at any given stage in a project, which project tasks (among hundreds to thousands tasks) are at risk of being delayed. Foreseeing such risks would allow project managers and software engineers to take prudent measures to assess and manage the risks, and consequently reduce the chance of their project being delayed.  This work aims to provide automated support to enable such a prediction. We mined over 100,000 JIRA issues/tasks from large software projects, and built a number of highly accurate predictive models that can predict whether a task or issue will get delayed and if so, the degree of delayness.
  1. Helping agile teams in effort estimation: It has now become a common practice for agile teams to go through each user story and estimate the effort for completing it. Story points are commonly used as a unit of measure for specifying the overall effort of a user story. Currently, most agile teams heavily rely on experts’ subjective assessment (e.g. planning poker, analogy, and expert judgment) to arrive at an estimate. This may lead to inaccuracy and more importantly inconsistencies between estimates. A machine learner can help the team maintain this consistency, especially in coping with increasingly large numbers of user stories. It does so by learning insight from past issues and estimations to make future estimations. We mined over 23,000 user stories recorded in JIRA from 16 large software projects, and built a deep learning model (using a novel combination of the Long Short Term Memory architecture and the Recurrent Highway Network) which recommends the effort of implementing a user story (in story points).
  1. Helping decision makers in delivery prediction: Iterative software development has become widely practiced in industry. Since modern software projects require fast, incremental delivery for every iteration of software development, it is essential to monitor the execution of an iteration, and foresee a capability to deliver quality products as the iteration progresses. In this work, we developed a novel, data-driven approach to providing
    automated support for project managers and other decision makers in predicting delivery capability for an ongoing iteration.  Our model was evaluated using 3,834 iterations/sprints and 56,687 issues recorded in JIRA we collected from 5 large software projects.
  1. Mining social norms from open software projects: Social norms facilitate coordination and cooperation among individuals, thus enable smoother functioning of social groups such as the highly distributed and diverse open source software development (OSSD) communities. In these communities, norms are mostly implicit and hidden in huge records of human-interaction information such as emails, discussions threads, bug reports, commit messages and even source code. This new line of research aims to extract social norms from the rich data available in software repositorie.

Using the same analytics data-driven approach, we can also build predictive models/recommendation systems for the following (but not limited to):

Please feel free to contact us, hoa (at) uow (dot) edu (dot) au, if you are interested in doing an industry project and/or a research project (for Honours, Master and PhD) in this area.


All the datasets used in our publications are made publicly available here. If you use our datasets, please cite our relevant paper in your publication.