Session + Live Q&A
Modern Data Pipelines in AdTech—Life in the Trenches
There are various tasks that the modern data pipelines approach helps us solve in different domains, including advertising. Modern data pipelines allow us to process data in a more efficient manner with a diverse set of data transformation tools for both batch and streaming data processing. AdTech is a traditional industry that constantly changes and innovates. Today, it draws a lot of attention as we’re expanding the reach and movement toward a cookieless world.
In this talk, you will learn how to use modern data pipelines for reporting and analytics as well as the case of historical data reprocessing in AdTech. We’ll dive deeper into each case, exploring the problem itself, implementation, challenges, and future improvements. In cases like business rule changes or errors in past data, we need to re-process our historical data, and it’s not a trivial task as it requires a lot of time, precision and computational resources for each step. Due to this, a whole section of the talk will be devoted to approaches to historical data reprocessing and data lifecycle management.
Main Takeaways
1 Hear about using data pipelines in production, especially in advertising.
2 Learn how to deal with historical data processing and data lifecycle management.
What is the focus of your work these days?
I mostly work on Captify's data pipelines, specifically in advertising. I joined the company almost three years ago. One of the main things for me now is, except for maintenance, the creation of new pipelines is adoption of some new tech and modernizing the pipelines, which they already have, and changes in the data infrastructure. so a lot of innovation as well.
And what's the motivation behind this talk?
Give more production level examples of what's happening for the Big Data engineers when they work on a specific use case. I noticed at various events that a lot of talks are more on the side of frameworks and how they can be used, but sometimes they may not give very practical understanding how to apply that to the pipelines other than if you build it, it will work. That's why I wanted to give this information to all the people that work on various pipelines so they would gain knowledge of how pipelines fail in production, what kind of issues can happen in general, what kind of pipelines can be used for specific tasks within the domain.
And how would you describe the persona and level of your target audience for this session?
I would describe this persona as a middle senior level Big Data engineer or an architect who is going to modernize the pipelines or just work in the company trying to understand how to improve the current data infrastructure and ecosystem that they have. Probably, there are issues with existing pipelines and they are trying to understand how to solve those issues. Also, I'd like to share some tips on how to approach various tasks.
What would you like this persona to walk away with after your session?
I would like this persona to walk away with a better understanding on how to approach the issues that they already have in the company, or just a new knowledge about advertising domains and the challenges that are out there and also, an understanding of the importance of being knowledgeable about the product. Not all of the approaches to the problems are a golden standard, but there are many approaches to the same problem, and the ones that I present reflect the experience of many engineers within the company where I work. But all of those things can also be improved in the future, and there are other approaches. So a conference is a great place to share our experiences within the community as well.
Speaker
Roksolana Diachuk
Big Data Engineer @Captify
Roksolana works as a Big Data Engineer at Captify. She is a speaker at technical conferences and meetups, one of the Women Who Code Kyiv leads. She is passionate about Big Data, Scala, and Kubernetes. Her hobbies include building technical topics around fairytales and discovering new cities.
Read moreFrom the same track
Taming the Data Mess, How Not to Be Overwhelmed by the Data Landscape
Wednesday May 18 / 09:00AM EDT
The data engineering field has evolved at a tremendous pace in the last decade, new systems that enable the processing of huge amounts of data generated enormous opportunities, as well as challenges for software practitioners. All these new tools and methodologies created a new set of...
Ismaël Mejía
Senior Cloud Advocate @Microsoft
Data Versioning at Scale: Chaos and Chaos Management
Wednesday May 18 / 10:10AM EDT
Version control is fundamental when managing code, but what about data? Our data changes over time, first since it accumulates, we have new data points for new points in time. But this is not the only reason. We also have additional data added to past time, since we were able to get additional...
Dr. Einat Orr
Co-creator of @lakeFS, Co-founder & CEO of Treeverse
Orchestrating Hybrid Workflows with Apache Airflow
Wednesday May 18 / 12:30PM EDT
According to analysts, 87 percent of enterprises have already adopted hybrid cloud strategies. Customers have many reasons why they need to support hybrid environments, from maximizing the value from heritage systems to meeting local compliance and data processing regulations. As they build...
Ricardo Sueiras
Principal Advocate in Open Source @AWS