Join Newsletter

Conference Program

All times displayed are in the Australia/Sydney timezone

8:45 AM

8:45 AM - 15 mins

Session Overviews and Introductions

9:00 AM

9:00 AM - 45 mins

Grand Ball Room

Playing with Words: Building Products with NLP

Hilary Mason

Playing with Words: Building Products with NLP

Hilary Mason

Imagine machines that interact with us using the same interface we use to interact with each other — spoken language! Recent progress in NLP has opened up new possibilities for language-based systems. In this talk, we'll explore the recent history of language models and highlight novel applications of statistical and deep learning approaches. Then, we'll explore emerging products that automate, generate, and create using these models, and discuss the implications for building them, including safety, ethics, and the invention of new design metaphors. Finally, we'll speculate about where this might take us in the next few years. Can machines ... play?

 
Read More

9:45 AM

9:45 AM - 25 mins

Break / Q&A with Hilary Mason

10:10 AM

10:10 AM - 30 mins

Grand Ball Room 1

Using AI to Mine Unstructured Research Papers to Fight COVID-19

Jennifer Marsman

Using AI to Mine Unstructured Research Papers to Fight COVID-19

Jennifer Marsman

There is an overwhelming amount of information (and misinformation) about COVID-19. How can we use AI to better understand this disease? In this session, we take an open dataset of research papers on COVID-19 and apply several machine learning techniques (name entity recognition of medical terms, finding semantically similar words, contextual summarization, and knowledge graphs) which can help first responders and medical professionals better find and make sense of the research they need. We will dive into the techniques used and share the code repository, so developers will walk away with the understanding of how to build a similar solution using Cognitive Search.

Read More

10:40 AM

10:40 AM - 25 mins

Break / Q&A with Jennifer Marsman

11:05 AM

11:05 AM - 30 mins

Grand Ball Room 1

Scaling the Machine Learning Platform at DoorDash

Hien Luu

Scaling the Machine Learning Platform at DoorDash

Hien Luu

DoorDash’s mission is to grow and empower local economies. DoorDash’s business is a 3-sided marketplace composed of Dashers, consumers, and merchants.

As DoorDash's business grows, it is essential to establish a centralized ML platform to accelerate the ML development process and to power the numerous ML use cases.  We are making good progress, but we are still in the early days of building out our ML platform.

This presentation will detail the DoorDash ML platform journey that includes the way we establish a close collaboration and relationship with the Data Science community, how we intentionally set the guardrails in the early days to enable us to make progress, the principled approach of building out the ML platform while meeting the needs of the Data Science community, and finally the technology stack and architecture that powers billions of predictions per day and supports a diverse set of ML use cases. They include search ranking, recommendation, fraud detection, food delivery assignment, food delivery arrival time prediction, and more.

Read More

11:35 AM

11:35 AM - 25 mins

Break / Q&A with Hien Luu

12:00 PM

12:00 PM - 30 mins

Grand Ball Room 1

Evolving the ML Platform organisation at Netflix: a case study

Julie Amundson

Evolving the ML Platform organisation at Netflix: a case study

Julie Amundson

Do you wish there was a Machine Learning model to tell you how to structure your ML teams? So do I! While we're waiting for that, I'll share the story of how the ML Platform organisation evolved at Netflix. Although this story is specific to our own journey to expand Netflix ML investments, there are a few lessons learned along the way that you'll be able to relate to. There are several factors going into org structure that we'll discuss, including: the specialty and skillsets of ML practitioners, the variety and depth of ML use cases, who's responsible for the data, the ownership model as ML projects go to production, and how the underlying Platforms are situated. I look forward to sharing and hearing your own thoughts afterward!

Read More

12:30 PM

12:30 PM - 25 mins

Break / Q&A with Julie Amundson

12:55 PM

12:55 PM - 30 mins

Lunch

1:25 PM

1:25 PM - 30 mins

Grand Ball Room 1

Taming the Long Tail of Industrial ML Applications

Savin Goyal

Taming the Long Tail of Industrial ML Applications

Savin Goyal

Data Science usage at Netflix goes much beyond our eponymous recommendation systems. It touches almost all aspects of our business - from optimizing content delivery and informing buying decisions to fighting fraud. Our unique culture affords our data scientists extraordinary freedom of choice in ML tools and libraries, all of which results in an ever-expanding set of interesting problem statements and a diverse set of ML approaches to tackle them. Our data scientists, at the same time, are expected to build, deploy, and operate complex ML workloads autonomously without the need to be significantly experienced with systems or data engineering. In this talk, I will discuss some of the challenges involved in improving the development and deployment experience for ML workloads. I will focus on Metaflow, our ML framework, which offers useful abstractions for managing the model’s lifecycle end-to-end, and how a focus on human-centric design positively affects our data scientists' velocity.

Read More

1:55 PM

1:55 PM - 25 mins

Break / Q&A with Savin Goyal

2:20 PM

2:20 PM - 30 mins

Grand Ball Room 1

Assisting design with machine learning in Canva’s editor

Will Radford

Assisting design with machine learning in Canva’s editor

Will Radford

Our team at Canva focuses on building features that make design simple, enjoyable and collaborative for more than 55 million people across the globe. For many who haven’t used design tools, starting with a blank page can be intimidating, which is where Canva’s library of more than 500,000 templates comes in. Unfortunately, switching between templates once required retyping your content. To fix this, we created a feature for our users to bring their text with them while exploring the library. The initial challenge was that the template metadata the feature relied on was scarce and costly for our in-house designers to annotate.

We wanted to predict metadata for our designers inside the Canva editor, but had to consider a number of real-world engineering tradeoffs. First, we’ll explain the user problem and provide a glimpse inside some of our templates and the metadata that enables text transfer. Then, we’ll explain what features we extracted for our scikit-learn random forest classifier and how we combined it with a designer-in-the-loop to bootstrap enough batch-predicted metadata to launch an MVP version of the feature. Finally, we’ll explain how we decided to reimplement model storage and inference in our TypeScript frontend stack. Creating this new feature was a joint effort made possible by a multidisciplinary team of designers, engineers and data scientists. We’re looking forward to sharing some of the lessons we learned along the way to shipping this smart feature.

Read More

2:50 PM

2:50 PM - 25 mins

Break / Q&A with Will Radford

3:15 PM

3:15 PM - 30 mins

Grand Ball Room 1

Yepoko Lessons For Machine Learning on Small Data

Xuanyi Chew

Yepoko Lessons For Machine Learning on Small Data

Xuanyi Chew

Let's face it, in most companies, the amount of good data available to perform machine learning is very small. Most data are small data. So how can we do good machine learning on small data?

Read More

3:45 PM

3:45 PM - 25 mins

Break / Q&A with Xuanyi Chew

4:10 PM

4:10 PM - 30 mins

Grand Ball Room 1

Lessons learned from building ML products

Mikio Braun

Lessons learned from building ML products

Mikio Braun

Building products based on machine learning requires much more than taking a ML algorithm and deploying it in the cloud. Based on my experience as a researcher, working in ecommerce and independent consultant, I talk about some of the lessons learned what is needed beyond pure ML algorithms to successfully build products with ML. How do you identify customer problems that can be tackled with ML? How does the technology landscape around ML look like? How do you set up teams and organizations to be "AI ready?" I'll be sharing some of my observation and insights.

Read More

4:40 PM

4:40 PM - 25 mins

Break / Q&A with Mikio Braun

5:05 PM

5:05 PM - 45 mins

Grand Ball Room

Do you want ML with that? When to say yes and why to say no.

Kendra Vant

Do you want ML with that? When to say yes and why to say no.

Kendra Vant

In this talk I'll speak about why you should only use ML when you really need to, some techniques we've used successfully at Xero to help cut through the noise/analysis paralysis, and why it might help to consider approaching the build of an ML inside the system the same way you might decide what car to buy.

Read More

5:50 PM

5:50 PM - 25 mins

Break / Q&A with Kendra Vant

8:45 AM

8:45 AM - 15 mins

Session Overviews and Introductions

9:00 AM

9:00 AM - 45 mins

Grand Ball Room

Building & Operating Autonomous Data Streams

Sid Anand

Building & Operating Autonomous Data Streams

Sid Anand

The world we live in today is fed by data. From self-driving cars and route planning to fraud prevention to content and network recommendations to ranking and bidding, the world we live in today not only consumes low-latency data streams, it adapts to changing conditions modeled by that data. 

 

While the world of software engineering has settled on best practices for developing and managing both stateless service architectures and database systems, the larger world of data infrastructure still presents a greenfield opportunity. To thrive, this field borrows from several disciplines : distributed systems, database systems, operating systems, control systems, and software engineering to name a few. 

 

Of particular interest to me is the sub field of data streams, specifically regarding how to build high-fidelity nearline data streams as a service within a lean team. To build such systems, human operations is a non-starter. All aspects of operating streaming data pipelines must be automated. Come to this talk to learn how to build such a system soup-to-nuts.

Read More

9:45 AM

9:45 AM - 25 mins

Break / Q&A with Sid Anand

10:10 AM

10:10 AM - 30 mins

Grand Ball Room 1

Data Rainbows - select * from cloud;

Nathan Wallace

Data Rainbows - select * from cloud;

Nathan Wallace

Drowning in a lake? Stuck inside a warehouse? See your data in a different light! Postgres Foreign Data Wrappers provide SQL queries to live cloud data - all the structure and much lighter weight. In this session, we'll explore the potential of Data Rainbows for growing cloud environments and outline the challenges of working with data you can see but can't quite touch.

 
Read More

10:40 AM

10:40 AM - 25 mins

Break / Q&A with Nathan Wallace

11:05 AM

11:05 AM - 30 mins

Grand Ball Room 1

Data Mesh; A principled introduction

Zhamak Dehghani

Data Mesh; A principled introduction

Zhamak Dehghani

For over half a century organizations have assumed that data is an asset to collect more of, and data must be centralized to be useful. These assumptions have led to centralized and monolithic architectures, such as data warehousing and data lake, that limit organization to innovate with data at scale.

 
Data Mesh as an alternative architecture and organizational structure for managing analytical data.
Its objective is enabling access to high quality data for analytical and machine learning use cases - at scale.
 
It's an approach that shifts the data culture, technology and architecture
- from centralized collection and ownership of data to domain-oriented connection and ownership of data
- from data as an asset to data as a product
- from proprietary big platforms to an ecosystem of self-serve data infrastructure with open protocols
- from top-down manual data governance to a federated computational one.
 
In this talk, Zhamak will introduce the principles underpinning Data Mesh and architecture.
Read More

11:35 AM

11:35 AM - 25 mins

Break / Q&A with Zhamak Dehghani

12:00 PM

12:00 PM - 30 mins

Grand Ball Room 1

Apache Pulsar and the Streaming Ecosystem

Matteo Merli

Apache Pulsar and the Streaming Ecosystem

Matteo Merli

Apache Pulsar is an open-source distributed pub-sub messaging system, developed under the stewardship of the Apache Software Foundation.

This talk will show how its unique architecture enables Pulsar to seamlessly support both streaming and messaging use cases in a single unified platform.

We will also show where Pulsar fits with the broader ecosystem of data streaming technologies and all the interoperability that is available out of the box, making it a particularly good choice for supporting any kind of data platform, where versatility, interoperability and scalability are the key requirements.

Read More

12:30 PM

12:30 PM - 25 mins

Break / Q&A with Matteo Merli

12:55 PM

12:55 PM - 30 mins

Lunch

1:25 PM

1:25 PM - 30 mins

Grand Ball Room 1

Foundations of Data Teams

Jesse Anderson

Foundations of Data Teams

Jesse Anderson

Successful data projects are built on solid foundations. What happens when we’re misled or unaware of what a solid foundation for data teams means? When a data team is missing or understaffed, the entire project is at risk of failure.

This talk will cover the importance of a solid foundation and what management should do to fix it. To do this I’ll be sharing a real-life analogy to show how we can be misled and what that means for our success rates.

We will talk about the teams in data teams: data science, data engineering, and operations. This will include detailing what each is, does, and the unique skills for the team. It will cover what happens when a team is missing and the effect on the other teams.

The analogy will come from my own experience with a house that had major cracks in the foundation. We were going to simply remodel the kitchen. We weren’t ever told about the cracks and the house needs a completely new foundation. In a similar way, most managers think adding in advanced analytics such as machine learning is a simple addition (remodel the kitchen). However, management isn’t ever told that you need all three data teams to do it right. Instead, management has to go all the way back to the foundation and fix it. If they don’t, the house (team) will crumble underneath the strain.

Read More

1:55 PM

1:55 PM - 25 mins

Break / Q&A with Jesse Anderson

2:20 PM

2:20 PM - 30 mins

Grand Ball Room 1

Sweet Streams are Made of These: Data Driven Development for Stream Processing

Caito Scherr

Sweet Streams are Made of These: Data Driven Development for Stream Processing

Caito Scherr

The strength of a powerful stream processing engine is in how fast, and how much data it can process. This naturally adds complexity to existing integration points and can lead to development overhead. Luckily, there is a set of data-driven development principles that are built to alleviate precisely these challenges. This talk will go over what these are and how to apply them at various points throughout the development process, using real-world successes (and failures!) as examples. Although the examples are for highly complex systems, this talk will be beginner-friendly and applicable to non-streaming use cases. 

Read More

2:50 PM

2:50 PM - 25 mins

Break / Q&A with Caito Scherr

3:15 PM

3:15 PM - 30 mins

Grand Ball Room 1

Analyzing a Terabyte of Game Data

Rimma Shafikova

Analyzing a Terabyte of Game Data

Rimma Shafikova

A couple of terabytes of data is not impressive by today's standards. A hard drive of that capacity costs about a hundred dollars. But things quickly get complicated when one needs to draw insights from a corpus of unstructured game scenarios that are increasing at a rate of a terabyte a year. 

You will hear some lessons learned by a data scientist wearing an extra hat of data engineer on this fun side project. The talk will cover topics from using Apache Spark distributed computing framework and optimizing Delta tables to making sense of resulted mega-dataset with graph theory and an interactive Streamlit application. 

 
Read More

3:45 PM

3:45 PM - 25 mins

Break / Q&A with Rimma Shafikova

4:10 PM

4:10 PM - 30 mins

Grand Ball Room 1

Islands in the Stream - What country music can teach us about event driven systems

Simon Aubury

Islands in the Stream - What country music can teach us about event driven systems

Simon Aubury

Event driven systems are all the rage. It's with good reason we're witnessing a transformation with businesses adopting event driven systems. Building systems around an event-driven architecture is powerful pattern for creating awesome data intensive applications.  But before we sail away to another world, let's avoid the common pitfalls of designing & running event driven systems.

Islands in the Stream - what Kenny Rogers can teach us about event driven systems from the wisdom of a country music classic

Read More

4:40 PM

4:40 PM - 25 mins

Break / Q&A with Simon Aubury

5:05 PM

5:05 PM - 30 mins

Grand Ball Room 1

Rights, Sovereignty and Governance in Official Reporting: Considerations in the Use of Aboriginal and Torres Strait Islander data

Kalinda Griffiths

Rights, Sovereignty and Governance in Official Reporting: Considerations in the Use of Aboriginal and Torres Strait Islander data

Kalinda Griffiths

The realisation for Indigenous people in Australia to be counted in official statistics occurred in 1967.
The identification of Indigenous people in Australia in national data highlights a range of historical
and contemporary issues that require our attention. This includes how Indigenous people have been
defined and by whom, as well as how identification is operationalised in official data collections.
Furthermore, the completeness and accuracy of Indigenous people identified in the data and the
impact this has on the measurement of health and wellbeing must also be taken into account. Official
national reporting of Indigenous people is calculated using data from censuses, vital statistics, and
existing administrative data collections and/or surveys. In alignment with human rights standards,
individuals in Australia can opt to self-identify as ‘Indigenous’ in the data. Australia’s colonial
context in which Aboriginal and Torres Strait Islander data is derived results in considerations about
the sovereign rights of Indigenous people globally in the use of data and how this can be actioned
through data governance processes.

Read More

5:35 PM

5:35 PM - 25 mins

Break / Q&A with Kalinda Griffiths

Back to Top