2019-10-14
Today our keynote Errol Koolmeister, head of AI tech & architecture at H&M Group and acting head of Data Science, shares how his work as a data scientist changed from crunching excel sheets to using AI pipelines in Production. Meet Errol on stag at the 6th PyCon Sweden at Münchenbryggeriet from 31st October to 1st November.
Christine: Can you give some hints about the topic of your talk?
Errol: In my talk I will share some of the learnings on setting up and scaling an AI function with a focus on the tech side. I will share how we create software that can go from a few markets in early proof of concept into global industrialization with a short timeline. I will share our architecture to accomplish this and highlight some of the success factors of our work.
Christine: You were working as a data scientist throughout your whole career, what is it that fascinates you about data science?
Errol: Well most of my professional life I have worked as a data scientist. What fascinates me is the insights we can derive from data and putting them into practical use. Becoming more relevant in general and being able to take informed decisions with the help of data.
Christine: Which tips or tools would you recommend to data scientists to maximize their insights? Where could they use python?
Errol: I am one of those that started out with excel… This was early in my career before I studied computer engineering and learned about SQL, databases and Java before finding Python 5+ years ago. The main speedup in my work came when I started saving my functions and generalizing them enough so that they could be reused as libraries. Spending a bit of time after each task to catch my learnings have really helped me improving the time to insights and action compared to a few years ago. Someone once told me that programming gives you superpowers and it is true. You constantly find new ways to process your data and new libraries are being created every day.
Christine: As the lead of data science at H&M you are working with big data, where do you see challenges in setting up advanced analytics with AI all the way into production?
Errol: The biggest challenge when working with large datasets is that you need to understand the practical implications of not fitting data into memory on your laptop or local server. You are entering the domain of distributed computing which adds an additional layer of complexity that will impact your code. These challenges are a bit easier today to handle ,compared to a few years ago, thanks to spark/pyspark. Other than that, the data science process is still the same. You are exploring your data, modelling it and then predicting the future. The main challenge is to do this in an efficient way. You need to write good code following best practices if you want to reach production and stay there.
Christine: What was your favourite project/proof of concept that became real? And what factors made it successful?
Errol: Well we have one project here at H&M that we started in January that just started delivering into production (September). The key success factors for that was that we started with a strong team that focused on the end to end process with software engineering best practices. So, creating an environment for data scientists to work with the exploratory part, feature engineering and modelling and then having a fully automated pipeline for training and prediction. The end result is really cool and I really enjoy watching the vision we created early in the year now delivering value.
Christine: Where do you see data science in production develop in the future?
Errol: We need to work more on how we handle our models in production. One of the big challenges right now is that we can create something that goes in production, but we are not that good in following up on the results or when we do the feedback loop is usually too long. This needs to improve for everything that goes into production. Things change and we need to understand that change and act accordingly.
Christine: Which topics would you like to explore more further on?
Errol: I would personally like to spend more time on graphs and explore how we can be better in using them. One of the topics I am really interested in but haven’t had the time to explore fully is knowledge graphs. They are really interesting concepts.
Learn more from the best and meet Errol live on stage at PyCon Sweden 2019. To secure your ticket click here and subscribe to PyCon Sweden social media channels on Facebook , Twitter and Instagram to receive updates about the event.
Author: Christine Winter, software developer at Ivbar & PyCon Sweden volunteer
2019-10-05
Today our keynote Shammamah Hossain, a McGill University alumnus with a joint degree in physics and computer science and software engineer at Plotly tells us about the python community in Montreal, her experiences of working in research compared with software development and what we will learn about Plotly's products during the workshop at the 6th PyCon Sweden at Münchenbryggeriet in the center of Stockholm..
Christine: Can you give some hints about the topic of your talk?
Shammamah: I'm very excited and honoured to be speaking at this year's PyCon! I know that the audience comprises people from all walks of life and many different industries, so I'll be talking about the importance of analytics and visualization workflows across all of those worlds.
Christine: How is the Python community in Montreal?
Shammamah: Montreal is actually a fairly prominent tech hub in Canada. There are many Python-related meetups in the city, which is wonderful -- and some of them actually take place at Plotly HQ! It's great to be able to share knowledge with fellow programmers face-to-face, instead of just over GitHub.
Christine: During Plotly’s workshop at PyCon Sweden 2019 we will learn more about data visualization tools. What excites you about the applications you are working with, Dash Bio and Dash DAQ?
Shammamah: The way that human beings best communicate and interact with data, in my opinion, is intuitive and based on our senses; we do not always immediately understand sets of numbers on a screen, but we do have an innate understanding of colours, shapes, and sizes that has been developed over thousands of years. This is particularly relevant in manufacturing processes and bioinformatics projects, both of which rely heavily on understanding large volumes of often complex data.
Being able to work on tools like Dash Bio and Dash DAQ means that I am playing a (small) part in "translating" computer-readable information into human-interpretable visualizations, and I personally find it incredibly exciting to see these projects being used "in the wild" for many different applications.
Christine: You were working as a research assistant before starting at Plotly. Where do you see the differences in developing programs for research projects compared to software products?
Shammamah: Research projects are generally far more specialized than software products. I remember that as part of my undergraduate thesis project, I wrote quite a bit of code that involved creating a simple neural network that was used to differentiate between two very specific particle collision events which involved either an electron or a photon. I can't imagine this being incredibly useful anywhere outside of that particular field of research (unless it is, in which case I would be very excited). As a consequence of that, the only people looking at and using my code were researchers from the same lab -- people with incredible levels of domain-specific knowledge who understood the "story" that my program was telling, and who could come to me directly if they weren't sure about the reasoning behind a particular design decision I made.
In contrast, software products need to be much more robust. Working on open-source projects like Dash has opened my eyes to the many things that need to be considered when developing software that is used by thousands of people worldwide -- for example, modularity, readability, and writing good tests. These things should be paramount when developing any software, whether it's for personal use or public use; however, the bar is much higher (simply because it needs to be) for software products that are created to serve a more varied audience.
Christine: What do you like about programming in python? Do you have a favourite library?
Shammamah: My favourite thing about Python is its accessibility; besides being useful in and of itself, it has the additional benefit of serving as a friendly introduction to the wide, wonderful world of software development. It's a language that has almost a one-to-one mapping with the way that humans reason through problems, and that makes it incredibly valuable as a medium for teaching the core concepts of programming.
I appreciate the power that other more lower-level languages have; I studied physics and computer science, so I know that it's important to understand the physical limitations of hardware and the contingencies that come hand-in-hand with that. However, I also acknowledge that things like memory allocation and efficient caching can be confusing for beginners to understand immediately.
As for a favourite library, I don't think I could have gotten through my degree without extensive use of numpy -- so I'll choose that!
Christine: What would you like to explore in the future?
Shammamah: So many things! I'm hoping that I'll be able to take an "outer space vacation" at some point during my life. On a more grounded note, I'm very interested in quantum computing; it's incredibly exciting to see a field that has a wealth of knowledge and applications that is yet to be discovered.
We are excited to have Shammamah coming all the way from Montreal to PyCon Sweden, if you want to see her life on stage and learn about Plotly's DAQ Bio during a workshop, get you ticket here and subscribe to PyCon Sweden social media channels on Facebook , Twitter and Instagram to receive updates about the event.
Author: Christine Winter, software developer at Ivbar & PyCon Sweden volunteer
2019-09-12
Less than two months to go until the doors open to the 6th PyCon Sweden at Münchenbryggeriet in the center of Stockholm.
Today our keynote Tess Ferrandez, software development engineer at Microsoft, shares her insights of the machine learning journey, from getting started to challenges in production. She also explains what can we developers can learn from machine learning algorithms and what makes the heart of an engineer beat faster.
Christine: Can you give some hints about the topic of your talk?
Tess: I discussed a little with Anna (Anna Kazakova Lindegren, chairwoman of PyCon Sweden), and I have a keynote prepared already about AI and ethics, but also have a session about my teams work with video action detection (soccer highlights and other use cases) with deep learning, and video processing using python libraries like keras and openCV. Anna said the second option was better and that it was ok to keep it pretty technical as well (given the crowd) rather than only inspirational.
Christine: Of the material and resources that you share online, which one would be a good first step towards machine learning?
Tess: My two favourite starter resources for deep learning are Francois Chollets book "Deep Learning with Python" and Andrew Ng's coursera deep learning specialization "deeplearning.ai". When you're past starter, I think a good way to really understand deep learning, is to review and try to implement papers. On our YouTube channel "Machine Learning at Microsoft" we do a weekly paper review, where we discuss the ins and outs of many important ML papers - on this channel we also talk about many of the projects we work on. I can add my notes from the Andrew Ng course to use as a companion for his video series.
Christine:Which Python libraries do you use regularly and which ones would you recommend?
Tess: Apart from the obvious, numpy and pandas, I frequently use keras + tensorflow for deep learning, and opencv for video/image pre-processing. I also like retinanet for object detection. I also frequently use a library some of the guys on my team created, mPyPl, a functional library (with pipes) for speedy processing of large amounts of videos and images.
Christine: Which projects do you like working with and what excites you about them?
Tess: I am an engineer at heart, was an .net dev for a long time before starting with python and machine learning. As an engineer, I am allergic to any project where we gratuitously use machine learning. As in use it because it is "cool", even though there are other better ways to solve the problem. I'm excited about projects where we solve real business needs in the best way possible. In my keynote [assuming I do the 2nd one] I will talk about how we combine engineering and deep learning.
Christine: Did you become a better soccer player from teaching an algorithm about it? (What can we learn from teaching machine learning algorithms and it’s outcomes?)
Tess: Interesting question, and one that I am personally thinking about a lot. While I didn't really become a better soccer player, we spent a lot of prep on understanding what makes an interesting play in soccer, as the goal of the algos was to determine "interesting passages" in the soccer match. In general, machine learning algorithms think very differently than humans, which also makes them quite hard to interpret. This is also something I touch on in my session, and depending on how interpretable you need your models to be you may need to make different choices.
Christine: Where do you see current challenges for machine learning and artificial intelligence in production?
Tess: The question of interpretability is a major one. If we don't know how the model makes its predictions, it may make very bad decisions, such as acting racist, or sexist or otherwise non-inclusive, or really plain wrong. On the other hand, if we know exactly how we make these decisions, we can implement it as a rules based system. We need to be somewhere in between, but being able to trust the system is one big challenge. A second, related challenge is that models can often be very fragile. They may work perfectly for exactly one set of data, but if you just change a few words in a text, or the lighting of an image, it may fail completely. Making the model more robust is a hard challenge, and sometimes also a security problem.
Christine: What subjects do you want to look into more in the future?
Tess: The subjects of interpretability, robustness and ethics are some of the areas I find most interesting, and the mix between traditional engineering and machine learning. Technically, I love anything vision (video, images etc.), but another area I spend a lot of time on during our projects is proper engineering practices, and dev ops for machine learning, so that it actually has a chance of making it into production.
Learn more from the best and meet Tess live on stage at PyCon Sweden 2019. To secure your ticket click here and subscribe to PyCon Sweden social media channels on Facebook , Twitter and Instagram to receive updates about the event.
Author: Christine Winter, software developer at Ivbar & PyCon Sweden volunteer