MLOps Community
SRE for ML Infra // Todd Underwood // MLOps Coffee Sessions #23
Coffee Sessions #23 with Todd Underwood of Google, Followups from OPML Talks on ML Pipeline Reliability co-hosted by Vishnu Rachakonda.
//Bio
Todd is a Director at Google and leads Machine Learning for Site Reliability Engineering Director. He is also Site Lead for Google’s Pittsburgh office. ML SRE teams build and scale internal and external ML services and are critical to almost every Product Area at Google.
Before working at Google, Todd held a variety of roles at Renesys. He was in charge of operations, security, and peering for Renesys’s Internet intelligence services that are now part of Oracle's Cloud service. He also did product work for some early social products that Renesys worked on. Before that Todd was Chief Technology Officer of Oso Grande, an independent Internet service provider (AS2901) in New Mexico.
//Other links referenced by Todd:
--------------- ✌️Connect With Us ✌️ -------------
Join our slack community: https://go.mlops.community/slack
Follow us on Twitter: @mlopscommunity
Sign up for the next meetup: https://go.mlops.community/register
Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/
Connect with Todd on LinkedIn: https://www.linkedin.com/in/toddunder/
Timestamps:
[00:00] Intro to Todd Underwood
[02:04] Todd's background
[08:54] What's kind of vision do you "paint"?
[14:54] Playing a little bit "devil's advocate." Do you think that's even possible?
[19:36] "Start serving to make sure of having the possibility to get it out." How do you feel about that?
[23:56] What advise could you give to other people who wanted to bring in ML professionals into their companies to make ML useful for them? [29:53] Is it useful to use these new models?
[32:25] Do you feel like there would be a point where there would be a standard procedure?
[35:50] How machine learning breaks
[40:44] As an engineering leader, what's your advice to other engineering leaders in terms of how to make that reflection on your team needs and failures...?
[48:42] It's the design that you're looking at as the problem, not the person.
[56:27] Do we think that people sold a bunch of stuff and now we were left with the results?
[1:00:46] Recommendations on readings, things to do to better hone our craft.
[1:03:35] The more you explore, the more you realize, what's going on? Where can I learn from?
[1:05:00] Since you are in the mode of predicting things and philosophical background, where are you seeing the industry going in the next 5 years as we create it?
Resources referenced in this episode:
https://www.youtube.com/watch?v=Nl6AmAL3i08&feature=emb_title&ab_channel=USENIX
https://www.youtube.com/watch?v=hBMHohkRgAA&ab_channel=USENIX
https://youtu.be/0sAyemr6lzQ https://youtu.be/EyLGKmPAZLY
https://www.usenix.org/conference/opml20/presentation/papasian
https://www.usenix.org/system/files/login/articles/02_underwood.pdf
https://storage.googleapis.com/pub-tools-public-publication-data/pdf/da63c5f4432525bcaedcebeb50a98a9b7791bbd2.pdf