Institute Projects

Bringing together research in Data Privacy and Data Science with a foundation of Software Engineering, our projects focus on ensuring the development of data driven applications for public good.



We are undertaking a number of projects, working with students and recent graduates who are gaining experience in real-world development life cycles, focussed on design and development of artefacts with outcomes that will have impact on society.



Below is a summary of our ongoing projects followed by individual student project opportunities that would suite students studying in the courses COMP8755, COMP4560, and COMP3740.

Data Privacy Using SOLID PODs


In SII we are data scientists and software engineers with a keen focus on research and development in data privacy. We are developing digital infrastructure around the SolidCommunity that deploys the Community Solid Server (SOcially LInked Data) to support the development of applications and an ecosystem for PODs (Personal Online Data stores). PODs provide individuals with access to and personal control over their own data rather than automatically surrendering our private data to centralised data stores owned by others. The infrastructure is based on the work of the Solid Project under the leadership of Tim Berners-Lee (the inventor of the world wide web).



Our collaborations with the Yarrabah Community’s Gurriny Health Clinic and through a clinical trial for the deployment of medical devices to support people with depression build on our expertise in machine learning and AI with a focus on open source packages and applications. We are developing a local ecosystem to support the community development of a POD-based technology.



We are building capability for our next generation of software engineers and data scientists. To do so we supervise student projects that develop open-source applications that pioneer new capabilities in decentralised data and computing for responsible software engineering and machine learning.



Resources include:


• An open access Solid server to create your own pod at the experimental Australian Solid Community;

solidcommunity



Please reach out to us if you are interested to enrol in a project with the Software Innovation Institute.



drawing



drawing drawing




Enquiries to Prof Graham Williams

Student Projects 2025 S1

Using a RAG model over Solid Pods in Flutter


Retrieval-Augmented Generation (RAG) optimises the output from large language models like ChatGPT to utilise data from a user’s own authoritative collection of data. Such a model is well suited to the concept of Solid Pods where an individual’s personal data is available only within their own Pod hence ensuring the privacy of the resulting model and it’s application.



This project begins with research on the available open source and locally run large language models that support RAG. Choosing one such model that can run on a user’s desktop, a Flutter/Dart based app will be developed to retrieve the user’s data to then support augmented model output personalised to the individual.



This activity is suitable for a 12 unit individual project, for a student interested in developing practical experience in large language models, software engineering, software development practices, technology development, and data science.




Enquiries to Prof Graham Williams

Simplifying Access to AI Models


The Machine Learning Hub provides open access to a growing selection of demonstrators and tools from state-of-the-art Artificial Intelligence (AI), Machine Learning, and Data Science. The tools allow the technology to be explored and utilised to build powerful command pipelines. The growing number of MLHub packages showcase specific capabilities that can be linked together.



The MLHub Flutter package provides GUI access to a growing number of AI models, easy to use.



This project will port a pre-existing git repository that implements different AI capabilities, either from ANU research projects or any generally available open source repositories. A focus this year will be the choice of LLMs to run locally. You will add the required meta-data for MLHub to the repository and develop simple Python or R scripts using the MLHub framework. As you implement the capabilities in MLHub you may also take the opportunity to enhance the open source MLHub framework. The new functionality will then be incoporated into the MLFlutter app to provide easy access to the new capabilities.



The project is suitable for students interested in exploring state-of-the-art technology and building capability for others to utilise sophisticated AI tools easily. You will build MLHub packages from already existing open source repositories and learn the basics of Flutter through template coding.



This activity is suitable for a 12 unit individual project, for a student interested in developing practical experience with AI technology, and making that technology more readily accessible through Flutter. It will primarily be undertaken in Python and Flutter/Dart.




Enquiries to Prof Graham Williams

Privacy-preserving record linkage in the presence of Missing data


Privacy-preserving record linkage (PPRL) is the process of identifying records that refer to the samereal-world entity across data sources held by different organizations while preserving the privacy of the entities whose records are being linked. These data sources often contain sensitive information of entities which needs to be protected. Because of the lack of unique entity identifiers across the data sources, the linking is usually based on quasi-identifying attributes such as names and addresses.



Handling missing values is one of the major challenges in different processing domains. In PPRL it is no different. Because of the missing values that occur in quasi-identifying attributes, linking records across databases can result in lower linkage quality.



The proposed project aims to investigates the means of overcoming the problem of missing values in PPRL to achieve high linkage quality. It explores the idea of using similar records in a data source to try and estimates the similarities of records that have missing values in them. The project will also investigate how different encoding techniques can be used to calculate attribute and relational similarities between records in the presence of missing values to identify matches and non-matches between different data sources in a privacy-preserving way.



Enquiries to Dr Anushka Vidanage

Privacy-Preserving Active Learning for Data Linkage


Data linkage is the process of identifying matching data points (or records) from multiple different data sources, which is required in a variety of applications, for example, cybersecurity and health analytics applications. However, the machine learning algorithms for data linkage significantly dependent on domain expertise for manual labelling of records for accurate linkage because the training data is generally not available. In real-world applications has several constraints which includes cost-constraints, privacy-constraints, and fairness-constraints. These constraints could lead to poor linkage quality and privacy issues.



In this project, we aim to study privacy-preserving active learning subject to the above three constraints. Privacy-preserving active learning techniques should not only reduce the cost of labelling by selecting informative data samples to be labelled but also need to reduce the privacy risk of re-identification. The project has the following objectives:

  • Compare existing algorithms for selecting data samples for manual labelling.
  • Develop novel active learning algorithms using privacy-enhancing technologies such as Differential privacy mechanisms.
  • Design fairness-aware algorithms for privacy-preserving active learning.
  • Evaluate (both theoretically and empirically) the trade-off between privacy, the accuracy of linkage, fairness of linkage, and cost provided by the designed algorithms.



Enquiries to Dr Anushka Vidanage

Privacy-Preserving Knowledge Graph Merging/Linking and Querying


Knowledge Graph (KG) has been popularly constructed and used by more and more organisationsdue to its ability to connect different types of data in meaningful ways and support rich dataservices. A KG is a heterogeneous graph composed of entities (nodes) and relations (edges), and in some KGs there are also properties (features) and labels for entities. However, the capabilities of KGs are limited by the data within organisations as there are certain privacy and security concerns of integrating KGs across organisations.



Within this project, we aim to explore how the KGs across organisations can be integrated and utilised using privacy enhancing technologies, such as differential privacy. We consider two important issues in KG isolation, which are:

  • Privacy-preserving knowledge graph merging – addresses the problem of identifying common entities in across KGs using perturbation techniques while balancing the privacy and utility trade-off.
  • Privacy-preserving knowledge graph query - addresses the problem of querying over securely stored graph databases. In the presence of multiple private knowledge graphs from different organisations it would be challenging to query the graphs to find matching entities across graphs while protecting the privacy of the matched entities. The project aims to solve this challenge.



Enquiries to Dr Anushka Vidanage

Simplifying User Access to AI


This project is currently in progress



The Machine Learning Hub is a repository of packages designed to quickly demonstrate, through a command line interface, Artificial Intelligence, Machine Learning, and Data Science capabilities. Current capabilities include speech to text, text to speech, translation, and much more.



This project will design and implement a unified graphical user interface (GUI) to support simple access to sophisticated state-of-the-art AI. You will develop the app for multiple platforms using the modern Flutter framework from Google. The app will build on and directly use the command line capabilities of MLHub, seamlessly orchestrating the input, output and command line arguments through the GUI. It will be simple to install onto the user’s own desktop, whether Linux, macOS, or Windows.



As the developer you will identify specific and useful new large language models, for example, to be added to the mlhub suite, each as a new mlhub package. The package can be installed and run locally from a fork of the original open source repository available through github. MLHUB.yaml files and Python support scripts will be implemented within your fork.



You can review and explore some of the technology available through MLHub from the MLHub Desktop Survival Guide. You can choose a subset of these AI tools for the GUI use case. The project will use the modern Flutter framework from Google. Previous experience with Flutter is not required.



This activity is suitable for a 12 unit individual project, for a student interested in developing skills in bringing technology to the broader population, particularly recent artificial intelligence developments.




Enquiries to Prof Graham Williams

A New Generation Rattle Data Science User Interface


This project is currently in progress



Rattle is a data mining and data science app that has been used in government, industry, and teaching for over 15 years. It is written in R, the statistical programming language used widely by data scientists. It is also used at the ANU in a number of our courses.



The user interface is quite dated now and ready for an update. This project is using Flutter/Dart to replace the current interface whilst maintaining the underlying dependence on R. It offers an opportunity for user interface data science design, for managing inter-process communications, and dealing with large datasets. This is an opportunity for a student to review the state-of-the-art in data science and to deliver a toold to interact with and exploredata. This will require learning, Flutter, Dart, and some R.



This activity is suitable for a 12 unit individual project, for a student interested in developing practical experience in software engineering, software development practices, technology development, and data mining and data science.




Enquiries to Prof Graham Williams

Projects Completed

Solid Health using PODs for Personal Data Privacy


A key research and development vision for us in SII is a future of decentralised web computing and data privacy. We share this vision through our collaborative and community lead project with the Gurriny Yealamucka Health Services Aboriginal Corporation, Gurriny supports the Yarrabah community in Far North Queensland. Community members each have a POD hosted on the Yarrabah Solid Server containing their own private health data, some from the Gurriny clinic itself, some collected by the individuals, and other from various sources including My Health Record, pathology and imaging services, hospital admissions data, and pharmacies.



For the first time we have the opportunity to bring together from disparate sources all of an individual’s private health data. The data is colelcted into a single location known as the individual’s personal online data store (POD). The data is by default encrypted and shared only as the individual so decides. An ecosystem of applications (apps) can be openly and freely developed, for example, to analyse the individual’s data, on their own device, to provide a holistic picture for improving individual health outcomes.



In general, data can be shared from a POD, under the individual’s control, with other apps and PODs. Such sharing can result in aggregated datasets that can then be analysed and machine learning models developed to support community wide outcomes and policy development. The results from aggregated analyses can be provided back to the individual’s PODs so that an individual can gain insights as to how their data has been useful for community outcomes and policy directions, and to provide specialised and individual advice.



Below we see a sample app developed for use by the Gurriny health care professionals who travel throughout the community visiting community members. The app provides the health care professional ready access to the community member’s POD and the data the member agrees to share, and allows the health care professional to undertake particular Medicare activities.



drawing



drawing drawing



The apps are implemented in Flutter and can access the individual POD data that sit on any SOLID server, whether locally or in the cloud.




Enquiries to Prof Graham Williams

STUDENT PROJECT Privacy Oriented Location Data Sharing and Analysis Using SOLID


This software engineering and artefact-oriented development-based project has already implemented a location-oriented and data recording app. Flutter/Dart was used for the front end with Solid server technology for the backend to store data in a privacy focussed way. As a live app it actively collects location pings and allows a survey to be completed at specific locations. The location data collected by the app is mapped using OpenStreetMap.



An opportunity exists as a new student project to build on the current app to extend it with the capability to share a user’s data within their POD to aggregate a collection of “friends” data for privacy preserving location analysis. This will require exploring various models for managing access controls, and implementing one.



This activity is suitable for a 12 unit individual project, for a student interested in developing practical experience in software engineering, technology development, and privacy oriented data collection and sharing.




Enquiries to Prof Graham Williams

STUDENT PROJECT Data Privacy Chat App using Flutter and PODs


A POD is a personal/private online data store designed to ensure our data remains under our control. To support us in managing our own private data the SOLID project provides a server for privately storing application data. Google’s Flutter, on the other hand, has become a popular mobile app development framework, which is also being adopted for desktop and web app development.



This project will explore opportunities for a Flutter implemented chat functionality using PODs to ensure privacy. The app will store its data (chats) in the cloud on PODs hosted on a SOLID server. Chats can be created, editted, indexed, searched, and tagged. They will be stored conforming to an RDF schema on a SOLID server.



The project is suitable for the student interested in exploring the latest approaches to app development, utilising modern software engineering practises, coupled with pioneering work to integrate privacy as a matter of course into our apps.



This activity is suitable for a 12 unit individual project, for a student interested in developing a practical application whilst experiencing an end-to-end software engineering project, with privacy focused technology.




Enquiries to Prof Graham Williams

Analysing the Carbon Emissions from ANU Travel


A key strategic target of the university is to achieve net zero emissions by 2025 for ANU direct on-campus activities, energy, business travel and waste. SII is analysing the 2019 ANU travel emissions to determine a baseline measure of ANU travel emissions (before the disruption during the COVID-19 pandemic). The objectives of this project are to:

  • Find a baseline of emissions for University-related travel in 2019 (and possibly other years)
  • Map the landscape of University-related travel habits across ANU

SII worked with the Below Zero Program to compile an inventory of 2019 travel emissions, and analyse the deidentified travel data to understand the business travel habits and resulting emissions across ANU. The project will also build a web application to allow stakeholders to inspect the 2019 travel emissions. These analyses will be implemented in python and dash.



drawing




Enquiries to Prof Graham Williams

COVID Epidemiology Data Science App


The CRISPER App monitored COVID-19 cases, hospitalisations, deaths, and ICU admissions in Australia. The app utilised the Crisper Data Engine which scrapes the latest data from the different Australian jurisdictions. It targets browser deployment on mobile devices but can be used on desktops. It also runs native on Android, Linux, Windows, MacOS, and iOS. The project uses the modern Flutter framework from Google for the front-end and the backend data engine is implemented in Python.



drawing



drawing drawing




Enquiries to Prof Graham Williams

Coding for Brain Stimulation


We’re supporting colleagues in the ANU School of Medicine who are conducting clinical trials of
non-invasive electrical stimulation of the brain as a treatment for depression and Alzheimer’s disease. Patients use a headset which is controlled by a handheld controller device with a pre-programmed treatment pattern installed on the controller device when the device is fitted in clinic. Patients then take the device home to use the headset as prescribed until the next clinical appointment. The software in the device controls the hardware, effectively controlling how the device operates and how the clinicians set the treatment and download the treatment log files.

This project involves linking all the software components into a single program, implementing error handling, conducting testing and optimising performance to ensure the device is ready for use in early 2023 trials, with refinement and improvement work possible into 2023. This is a great opportunity to work with the cross disciplinary medical and computing team developing pioneering new treatments for mental illnesses. You will be developing the skills to write optimised hardware control code, in a resource contrained environment due to the small chip size and resources in the device. The device uses micropython which is a minimalist implementation of python will all of the packages needed for this work.



This activity is suitable for coursework or HDR students who are strong python programmers and available for 2+ days per week for 2-3 months, with other project opportunities after.




Enquiries to Prof Graham Williams

Delivering MLHub as a Container


The Machine Learning Hub provides open access to a growing selection of demonstrators and tools from state-of-the-art Artificial Intelligence (AI), Machine Learning, and Data Science. The tools allow the technology to be explored and utilised to build powerful command pipelines. The growing number of MLHub packages showcase specific capabilities that can be linked together.



This project will focus on developing a container-based delivery of the mlhub suite, using modern container type packaging to ensure mlhub and a selection of core packages can be installed easily on any of Linux, macOS, and Windows.



The project is suitable for students interested in exploring state-of-the-art technology and building capability for others to utilise sophisticated tools easily. You will research container technology for packaging mlhub.



This activity is suitable for a 6 unit individual project, for a student interested in developing practical experience with AI and container technology, and making that technology more readily accessible.




Enquiries to Prof Graham Williams

STUDENT PROJECT ML Hub Package for Pupil Analysis


The Machine Learning Hub provides ready access to a growing selection of demonstrators and tools from state-of-the-art Artificial Intelligence (AI), Machine Learning, and Data Science. The tools allow the technology to be explored and utilised to build powerful AI command pipelines. The growing number of MLHub packages showcase specific capabilities that can be linked together.



This project will implement eye pupil analysis using open source tools from repositories including the Pupil Labs Project. The project will implement a demonstration script and a suite of scripts for specific tasks using the MLHub framework. As you implement the capabilities in MLHub you may also take the opportunity to enhance the open source MLHub framework itself.



The project is suitable for students interested in exploring state-of-the-art AI technology and building capability for others to utilise the tools. You will build an MLHub package from the already existing open source repositories rather than further developing the algorithms for analysis.



This activity is suitable for a 6 or 12 unit individual project, for a student interested in developing practical experience with AI technology, and making that technology more readily accessible.




Enquiries to Prof Graham Williams

Data Privacy for a Flutter Notebook


A POD is a personal/private online data store designed to ensure our data remains under our control. To support us in managing our own private data the SOLID project provides a server for privately storing application data. Google’s Flutter, on the other hand, has become a popular mobile app development framework, which is also being adopted for desktop and web app development.



This project will build on the initial template implementation of a flutter app targetting mobile devices, desktops and browsers. The app is a note taker that will store its data (markdown documents) in the cloud on a POD hosted on a SOLID server. Notes can be created, edited, indexed, searched, and tagged. They will be stored conforming to an RDF schema on a SOLID server.



The project is suitable for the student interested in exploring the latest approaches to app development, utilising modern software engineering practises, coupled with pioneering work to integrate privacy as a matter of priority into our apps.



This activity is suitable for a 6 unit individual project, for a student interested in developing a practical application whilst experiencing an end-to-end software engineering project, with privacy focused technology.




Enquiries to Prof Graham Williams

We are delivering!

Our mission is to deliver technology for public good and in particular delivering technology and projects to support your privacy. We have been grateful to the teams of talented people working with us. Some of our projects include podnotes, solidpod, yarrabah, and solid community au.