Thus far, the security community has treated machine learning as a research problem. The painful oversight here is in thinking that laboratory results would translate easily to the real world, and as such, not devoting sufficient focus to bridging that gap. Researchers enjoy the luxuries of neat bite-sized datasets to experiment upon, but the harsh reality of millions of potentially malicious files streaming in daily soon hits would-be ML-practitioners in the face like a tsunami-sized splash of ice water. And while in research there’s no such thing as ““too much”” data, dataset sizes challenge real-world cyber security professionals with tough questions: ““How will we store these files efficiently without hampering our ability to use them for day-to-day operations?”” or ““How do we satisfy competing use-cases such as the need to analyze specific files and the need to run analyses across the entire dataset?”” Or maybe most importantly: ““Will my boss have a heart-attack when he sees my AWS bill?””
In this talk, we will provide a live demonstration of the system we’ve built using a variety of AWS services including DynamoDB, Kinesis, Lambda, as well as some more cutting edge AWS services such as Redshift and ECS Fargate. We will go into depth about how the system works and how it answers the difficult questions of real world ML such as the ones listed above. This talk will provide a rare look into the guts of a large-scale machine learning production system. As a result, it will give audience members the tools and understanding to confidently tackle such problems themselves and ultimately give them a bedrock of immediately practical knowledge for deploying large-scale on-demand deep learning in the cloud.