It has been over 10 years since the ‘Data Deluge’ became a phenomenon of universal interest and the multidisciplinary area of data science emerged to harness the potential of Big Data.
The rise of Big Data alongside an almost obscene amount of funding from Big Tech has resulted in game-changing advancements. The most recent is ChatGPT – the latest addition in a series of developments in the realm of Generative AI.
Generative AI is a type of artificial intelligence that uses deep learning techniques to create new and unique data, rather than just making predictions or classifications based on pre-existing data.
There seem to be endless possibilities and opportunities for creativity and productivity through Generative AI, like writing an essay, producing code, composing music, and even more when multiple AI models are included. For example, Stable Diffusion can generate images from a textual description.
During the last few months, there is a question keeping me awake as I see researchers, students, professionals, and children interacting with Generative AI. Are we ready, at an individual and societal level, to fully harness the potential of what these technologies – built from large datasets and opaque models – can offer? This is of course a multi-faceted and highly complex question. I identify three areas that I think need our attention now.
1. Training for quality
Training data is a linchpin for these advanced models. I know from my work on Information Resilience, that while quantity of data can drive performance, the quality characteristics, including those that reduce bias, toxicity, profanity and harm, are much harder to train.
Evaluating data quality and ensuring its fitness for purpose requires not just technical prowess and a hefty budget for data curators, but also a foundational set of values that will transfer through into the data curation, cleaning and labelling activities.
At the Australian Research Council Training Centre for Information Resilience, we are working with our industry and government partners to build knowledge and workforce capacity for tackling the challenges of Information Resilience relating to:
- Responsible use of data: To create and support capacity for responsible management of data assets through principled approaches to data governance, access and sharing.
- Data curation at scale: To build new data curation methods through machine learning, crowd-sourcing and human-in-the-loop techniques to achieve data curation at scale.
- Algorithmic transparency: To enable and promote interpretability, uncertainty quantification, unbiasedness, transparency, and reproducibility into the design of learning algorithms.
- Trusted data partnerships: To improve data literacy and trust in data linking within the wider community, working towards reducing barriers in data sharing and flow of knowledge.
- Agility in value creation from data: To enable agile deployment of data driven solutions within IT landscapes and business processes.
2. Addressing the skills shortage
A second area that needs our attention is the global skills shortage for qualified data scientists and machine learning engineers. The skills shortage and a lack of basic consumer-level digital skills can contribute to expanding the digital divide. There is an evident and urgent need to invest in digital and data talent pipelines at all levels.
I cannot emphasise enough the importance of nurturing a homegrown expert base of research leaders who not just use but also build cutting-edge technologies and have a deep understanding of the so-called impenetrable black boxes like Generative AI models. Without this talent pipeline and expert base we are importing not only foreign technologies but also the value systems embedded in those technologies.
3. Making AI accessible
We know that progress is asymmetrical. While AI growth for consumer internet companies like Amazon, Google, Baidu, Alibaba, and Apple has been phenomenal, other sectors – including manufacturing, finance, and agriculture – have yet to harness the full potential that current AI solutions can offer. We still need to overcome fundamental scientific challenges to make the value of AI and data science more accessible to the broader span of business and industry sectors.
We are in the midst of game-changing advancements in computing that have the potential to assist with some of the biggest challenges of our times. My hope is that as we engage in healthy debates about the benefits and limitations of these technologies, and that we do not get polarised in our views, which stifles innovation and progress.
Professor Shazia Sadiq FTSE
Data Engineer & Director, University of Queensland
Professor Shazia Sadiq FTSE has made lasting contributions to responsible and integrated solutions for effective information processes and data quality management. These contributions have substantially influenced international research activity in the field. She is a champion of trans-disciplinary work and through her foresight and capacity for collaboration, she has repeatedly encouraged and managed successful outcomes from diverse teams. Shazia is a keen proponent of ICT careers and has led and developed a range of programs such as national competitions and women-in-computing initiatives, which have engaged and benefited thousands of young people.