This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more, visit the FiftyOne project page on GitHub.

Upcoming events (4+)

See all

Network event
180 attendees from 36 groups hosting
Tue, Apr 22, 2025, 4:30 PM UTCApril 22 - Convolutional Neural Networks Workshop
Link visible for attendees
Register for the Zoom

When and Where

April 22, 2025

6:30 PM to 8:30 PM CET | 9:30 AM to 11:30 AM Pacific

Workshops are delivered over Zoom

About the Workshop

Join us for a 12-part, hands-on series that teaches you how to work with images, build and train models, and explore tasks like image classification, segmentation, object detection, and image generation. Each session combines straightforward explanations with practical coding in PyTorch and FiftyOne, allowing you to learn core skills in computer vision and apply them to real-world tasks.

In this session, we’ll delve deeper into CNN architectures by focusing on upsampling, channel mixing, and semantic segmentation techniques. Build a U-Net model for semantic image segmentation and inspect its predictions with FiftyOne.

These are hands-on maker workshops that make use of GitHub Codespaces, Kaggle notebooks, and Google Colab environments, so no local installation is required (though you are welcome to work locally if preferred!)

Workshop Resources

You can find the workshop materials in this GitHub repository.

About the Instructor

Antonio Rueda-Toicen, an AI Engineer in Berlin, has extensive experience in deploying machine learning models and has taught over 300 professionals. He is currently a Research Scientist at the Hasso Plattner Institute. Since 2019, he has organized the Berlin Computer Vision Group and taught at Berlin’s Data Science Retreat. He specializes in computer vision, cloud technologies, and machine learning. Antonio is also a certified instructor of deep learning and diffusion models in NVIDIA’s Deep Learning Institute.
18 attendees from this group+13
Network event
120 attendees from 36 groups hosting
Wed, Apr 23, 2025, 4:00 PM UTCApril 23 - Advanced Computer Vision Data Curation and Model Evaluation Workshop
Link visible for attendees
When and Where

April 23, 2025 | 9:00 – 10:30 AM Pacific

Virtually over Zoom. Sign up!

About the Workshop

Are you looking for simpler and more accurate ways to perform common data curation and model evaluation tasks for your computer vision workflows?

Then this workshop with Harpreet Sahota is for you! In this 90 min hands-on workshop, we’ll show you how to make use of FiftyOne’s panel and plugin framework to learn how to:

Customize the FiftyOne App to work the way you want to work

Quickly integrate FiftyOne with new models, datasets, and MLOps tools

Automate common data curation and model evaluation tasks

Streamline your computer vision workflows with less code and more clicks

Whether you are a beginner or advanced user of FiftyOne, looking for how to get started with customizing the dozens of existing plugins or interested in creating your own, there will be something for you in this workshop!

Prerequisites

A working knowledge of Python and basic familiarity with FiftyOne. All attendees will get access to the tutorials, videos, and code examples used in the workshop.
8 attendees from this group+3
Network event
303 attendees from 36 groups hosting
Thu, Apr 24, 2025, 5:00 PM UTCApril 24, 2025 - AI, Machine Learning and Computer Vision Meetup
Link visible for attendees
This is a virtual event.

Register for the Zoom

Towards a Multimodal AI Agent that Can See, Talk and Act

The development of multimodal AI agents marks a pivotal step toward creating systems capable of understanding, reasoning, and interacting with the world in human-like ways. Building such agents requires models that not only comprehend multi-sensory observations but also act adaptively to achieve goals within their environments. In this talk, I will present my research journey toward this grand goal across three key dimensions.

First, I will explore how to bridge the gap between core vision understanding and multimodal learning through unified frameworks at various granularities. Next, I will discuss connecting vision-language models with large language models (LLMs) to create intelligent conversational systems. Finally, I will delve into recent advancements that extend multimodal LLMs into vision-language-action models, forming the foundation for general-purpose robotics policies. To conclude, I will highlight ongoing efforts to develop agentic systems that integrate perception with action, enabling them to not only understand observations but also take meaningful actions in a single system.

Together, these lead to an aspiration of building the next generation of multimodal AI agents capable of seeing, talking, and acting across diverse scenarios in both digital and physical worlds.

About the Speaker

Jianwei Yang is a Principal Researcher at Microsoft Research (MSR), Redmond. His research focuses on the intersection of vision and multimodal learning, with an emphasis on bridging core vision tasks with language, building general-purpose and promptable multimodal models, and enabling these models to take meaningful actions in both virtual and physical environments.

ConceptAttention: Interpreting the Representations of Diffusion Transformers

Recently, diffusion transformers have taken over as the state-of-the-art model class for both image and video generation. However, similar to many existing deep learning architectures, their high-dimensional hidden representations are difficult to understand and interpret. This lack of interpretability is a barrier to their controllability and safe deployment.

We introduce ConceptAttention, an approach to interpreting the representations of diffusion transformers. Our method allows users to create rich saliency maps depicting the location and intensity of textual concepts. Our approach exposes how a diffusion model “sees” a generated image and notably requires no additional training. ConceptAttention improves upon widely used approaches like cross attention maps for isolating the location of visual concepts and even generalizes to real world (not just generated) images and video generation models!

Our work serves to improve the community’s understanding of how diffusion models represent data and has numerous potential applications, like image editing.

About the Speaker

Alec Helbling is a PhD student at Georgia Tech. His research focuses on improving the interpretability and controllability of generative models, particularly for image generation. His research is more application focused, and he has have interned at a variety of industrial research labs like Adobe Firefly, IBM Research, and NASA Jet Propulsion Lab. He also has a passion for creating explanatory videos of interesting machine learning and mathematical concepts.

RelationField: Relate Anything in Radiance Fields

Neural radiance fields recently emerged as a 3D scene representation extended by distilling open-vocabulary features from vision-language models. Current methods focus on object-centric tasks, leaving semantic relationships largely unexplored. We propose RelationField, the first method extracting inter-object relationships directly from neural radiance fields using pairs of rays for implicit relationship queries. RelationField distills relationship knowledge from multi-modal LLMs. Evaluated on open-vocabulary 3D scene graph generation and relationship-guided instance segmentation, RelationField achieves state-of-the-art performance.

About the Speaker

Sebastian Koch is a PhD student at Ulm University and Bosch Center for Artificial Intelligence. He is supervised by Timo Ropinski from Ulm University. His main research interest lies at the intersection of computer vision and robotics. The goal of his PhD is to develop 3D scene representations of the real world that are valuable for robots to navigate and solve tasks within their environment

RGB-X Model Development: Exploring Four Channel ML Workflows

Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this exploding field of Visual AI and show some best practices on how to work with these complex data formats!

About the Speaker

Daniel Gural is a seasoned Machine Learning Evangelist with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. Currently serving as a valuable member of Voxel51, he takes a leading role in efforts to bridge the gap between practitioners and the necessary tools, enabling them to achieve exceptional outcomes. Daniel’s extensive experience in teaching and developing within the ML field has fueled his commitment to democratizing high-quality AI workflows for a wider audience.
21 attendees from this group+16
Network event
58 attendees from 36 groups hosting
Tue, Apr 29, 2025, 4:30 PM UTCApril 29 - Model Optimization: Data Augmentation & Regularization Workshop
Link visible for attendees
Register for the Zoom

When and Where

April 29, 2025

6:30 PM to 8:30 PM CET | 9:30 AM to 11:30 AM Pacific

Workshops are delivered over Zoom

About the Workshop

Join us for a 12-part, hands-on series that teaches you how to work with images, build and train models, and explore tasks like image classification, segmentation, object detection, and image generation. Each session combines straightforward explanations with practical coding in PyTorch and FiftyOne, allowing you to learn core skills in computer vision and apply them to real-world tasks.

In this session, we’ll introduce optimization strategies including data augmentation, dropout, batch normalization, and transfer learning. Implement an augmented network using a fruits dataset with models like VGG-16 and ResNet18, and analyze the results with FiftyOne.

These are hands-on maker workshops that make use of GitHub Codespaces, Kaggle notebooks, and Google Colab environments, so no local installation is required (though you are welcome to work locally if preferred!)

Workshop Resources

You can find the workshop materials in this GitHub repository.

About the Instructor

Antonio Rueda-Toicen, an AI Engineer in Berlin, has extensive experience in deploying machine learning models and has taught over 300 professionals. He is currently a Research Scientist at the Hasso Plattner Institute. Since 2019, he has organized the Berlin Computer Vision Group and taught at Berlin’s Data Science Retreat. He specializes in computer vision, cloud technologies, and machine learning. Antonio is also a certified instructor of deep learning and diffusion models in NVIDIA’s Deep Learning Institute.
4 attendees from this group