
What we’re about
đź–– This virtual group is for data scientists, machine learning engineers, and open source enthusiasts.
Every month we’ll bring you diverse speakers working at the cutting edge of AI, machine learning, and computer vision.
- Are you interested in speaking at a future Meetup?
- Is your company interested in sponsoring a Meetup?
This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more, visit the FiftyOne project page on GitHub.
Upcoming events (4+)
See all- Network event180 attendees from 36 groups hostingApril 22 - Convolutional Neural Networks WorkshopLink visible for attendees
When and Where
- April 22, 2025
- 6:30 PM to 8:30 PM CET | 9:30 AM to 11:30 AM Pacific
- Workshops are delivered over Zoom
About the Workshop
Join us for a 12-part, hands-on series that teaches you how to work with images, build and train models, and explore tasks like image classification, segmentation, object detection, and image generation. Each session combines straightforward explanations with practical coding in PyTorch and FiftyOne, allowing you to learn core skills in computer vision and apply them to real-world tasks.
In this session, we’ll delve deeper into CNN architectures by focusing on upsampling, channel mixing, and semantic segmentation techniques. Build a U-Net model for semantic image segmentation and inspect its predictions with FiftyOne.
These are hands-on maker workshops that make use of GitHub Codespaces, Kaggle notebooks, and Google Colab environments, so no local installation is required (though you are welcome to work locally if preferred!)
Workshop Resources
You can find the workshop materials in this GitHub repository.
About the Instructor
Antonio Rueda-Toicen, an AI Engineer in Berlin, has extensive experience in deploying machine learning models and has taught over 300 professionals. He is currently a Research Scientist at the Hasso Plattner Institute. Since 2019, he has organized the Berlin Computer Vision Group and taught at Berlin’s Data Science Retreat. He specializes in computer vision, cloud technologies, and machine learning. Antonio is also a certified instructor of deep learning and diffusion models in NVIDIA’s Deep Learning Institute.
- Network event120 attendees from 36 groups hostingApril 23 - Advanced Computer Vision Data Curation and Model Evaluation WorkshopLink visible for attendees
When and Where
April 23, 2025 | 9:00 – 10:30 AM Pacific
About the Workshop
Are you looking for simpler and more accurate ways to perform common data curation and model evaluation tasks for your computer vision workflows?
Then this workshop with Harpreet Sahota is for you! In this 90 min hands-on workshop, we’ll show you how to make use of FiftyOne’s panel and plugin framework to learn how to:
- Customize the FiftyOne App to work the way you want to work
- Quickly integrate FiftyOne with new models, datasets, and MLOps tools
- Automate common data curation and model evaluation tasks
- Streamline your computer vision workflows with less code and more clicks
Whether you are a beginner or advanced user of FiftyOne, looking for how to get started with customizing the dozens of existing plugins or interested in creating your own, there will be something for you in this workshop!
Prerequisites
A working knowledge of Python and basic familiarity with FiftyOne. All attendees will get access to the tutorials, videos, and code examples used in the workshop.
- April 24 - Munich AI, Machine Learning and Computer Vision MeetupImpact Hub Munich GmbH, MĂĽnchen
Date and Time
April 24, 2025 from 5:30 PM to 8:30 PM
Location
Impact Hub Munich, Gotzinger Str. 8 81371 Munich
Industrial Anomaly Detection – Challenges and Opportunities
Deep learning-based anomaly detection plays a key role in visual quality inspection and has received growing attention from the research community in recent years. However, reliably detecting anomalies remains a challenging problem. This talk provides an overview of the current state of the field, discussing recent progress, ongoing challenges, and potential future directions. We will explore both the limitations of existing approaches and opportunities for further improvement in real-world applications.
About the Speakers
Lars Heckler-Kram studied mechatronics and robotics at the Technical University of Munich (TUM) and received his Master’s degree in 2022. He is currently working toward the PhD degree with MVTec Software GmbH and the TUM School of Computation, Information and Technology. His research interests focus on industrial visual anomaly detection.
Jan-Hendrik Neudeck received his Master’s degree in Computer Science with a specialization in computer vision and deep learning from the Technical University of Munich (TUM) in 2020. Since then, he has been working as a research engineer at MVTec Software GmbH.
Human Motion Prediction – Enhanced Realism via Nonisotropic Gaussian Diffusion
Predicting future human motion is a key challenge in generative AI and computer vision, as generated motions should be realistic and diverse at the same time. This talk presents a novel approach that leverages top-performing latent generative diffusion models with a novel paradigm. Nonisotropic Gaussian diffusion leads to better performance, fewer parameters, and faster training at no additional computational cost. We will also discuss how such benefits can be obtained in other application domains.
About the Speaker
Cecilia Curreli is a Ph.D. student at the Technical University of Munich, specializing in generative models. A member of the AI Competence Center at MCML, she has conducted research in deep learning, computer vision, and quantum physics through international collaborations with the University of Tokyo and the Chinese Academy of Science.
Your Data Is Lying to You: How Semantic Search Helps You Find the Truth in Visual Datasets
High-performing models start with high-quality data—but finding noisy, mislabeled, or edge-case samples across massive datasets remains a significant bottleneck. In this session, we’ll explore a scalable approach to curating and refining large-scale visual datasets using semantic search powered by transformer-based embeddings. By leveraging similarity search and multimodal representation learning, you’ll learn to surface hidden patterns, detect inconsistencies, and uncover edge cases. We’ll also discuss how these techniques can be integrated into data lakes and large-scale pipelines to streamline model debugging, dataset optimization, and the development of more robust foundation models in computer vision. Join us to discover how semantic search reshapes how we build and refine AI systems.
About the Speaker
Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. During her PhD and Postdoc research, she deployed multiple low-cost, smart edge & IoT computing technologies, such as farmers, that can be operated without expertise in computer vision systems. The central objective of Paula’s research has been to develop intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs, such as those in the agricultural industry.
Bridging the Gap in Explainable AI: Extracting and Building Datasets from Key Intermediate Layers
Explainable AI (XAI) often falls short at runtime, particularly when extracting concepts from intermediate layers without predefined labels. While current open-source tools focus on model explainability post hoc, they lack efficient dataset-building mechanisms from these crucial layers. This talk introduces a new open-source repository designed to seamlessly compute, store, and train on raw tensor data from intermediate layers—scaling from minimal compute to terabytes of data. By enabling structured dataset generation and improving mechanistic interpretability, this initiative pushes the boundaries of XAI, making it more practical and accessible for real-world applications.
About the Speaker
Syed Sha Qutub, an AI researcher at Intel Labs, specializes in explainability and model interpretability. With a strong background in deep learning and open-source contributions, he is currently working on bridging the gap between theoretical XAI and real-world applications. His early research focused on enhancing AI model resilience to platform errors like bit flips from alpha particles.
- Network event303 attendees from 36 groups hostingApril 24, 2025 - AI, Machine Learning and Computer Vision MeetupLink visible for attendees
This is a virtual event.
Towards a Multimodal AI Agent that Can See, Talk and Act
The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions.
First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system.
Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds.
About the Speaker
Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments.
ConceptAttention: Interpreting the Representations of Diffusion Transformers
Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment.
We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models!
Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing.
About the Speaker
Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts.
RelationField: Relate Anything in Radiance Fields
Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance.
About the Speaker
Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment
RGB-X Model Development: Exploring Four Channel ML Workflows
Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats!
About the Speaker
Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience.
Past events (31)
See all- Network event81 attendees from 36 groups hostingApril 16 - Getting Started with FiftyOne WorkshopThis event has passed