Analyzing a Terabyte of Game Data
YOW! Data 2021
A couple of terabytes of data is not impressive by today's standards. A hard drive of that capacity costs about a hundred dollars. But things quickly get complicated when one needs to draw insights from a corpus of unstructured game scenarios that are increasing at a rate of a terabyte a year.
You will hear some lessons learned by a data scientist wearing an extra hat of data engineer on this fun side project. The talk will cover topics from using Apache Spark distributed computing framework and optimizing Delta tables to making sense of resulted mega-dataset with graph theory and an interactive Streamlit application.
Rimma Shafikova is a Data Scientist at Perth-based social gaming company
VGW and a certified Neo4j professional. She uses graph database and graph
algorithms to aid VGW’s rapidly growing poker business, particularly in the area
of game integrity, detecting poker fraud.