This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more, visit the FiftyOne project page on GitHub.

Upcoming events (4+)

See all

Network event
293 attendees from 39 groups hosting
Thu, Jun 19, 2025, 5:00 PM UTCJune 19 - AI, ML and Computer Vision Meetup
Link visible for attendees
When
June 19, 2025 | 10:00 AM Pacific

When and Where
Online. Register for the Zoom.

Multi-Modal Rare Events Detection for SAE L2+ to L4

A burst tire on the highway or a fallen motorbiker occur rarely and thus pose extra efforts to Autonomous vehicles. Methods to tackle such edge cases in road scenarios are explained.

About the Speaker

Wolfgang Schulz is Product Owner for Lidar Perception at Continental. He engages in the automotive industry since 2005. With his team he currently works on components for an SAE L4 stack.

Voxel51 + NVIDIA Omniverse: Exploring the Future of Synthetic Data

Join us for a lightning talk on one of the most exciting frontiers in Visual AI: synthetic data. We’ll showcase a sneak peek of the new integration between FiftyOne and NVIDIA Omniverse, featuring fully synthetic downtown scenes of Santa Jose. NVIDIA Omniverse is enabling the generation of ultra-precise synthetic sensor data, including LiDAR, RADAR, and camera feeds, while FiftyOne is making it easy to extract value from these rich datasets. Come see the future of sensor simulation and dataset curation in action, with pixel-perfect labels to match.

About the Speaker

Daniel Gural is a seasoned Machine Learning Engineer at Voxel51 with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data.

O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models

We propose O-TPT, a method to improve the calibration of vision-language models (VLMs) during test-time prompt tuning. While prompt tuning improves accuracy, it often leads to overconfident predictions. O-TPT introduces orthogonality constraints on textual features, enhancing feature separation and significantly reducing calibration error across multiple datasets and model backbones.

About the Speaker

Ashshak Sharifdeen is a visiting Student Researcher at Mohamed bin Zayed University of Artificial Intelligence, UAE

Advancing MLLMs for 3D Scene Understanding

Recent advances in Multimodal Large Language Models (MLLMs) have shown impressive reasoning capabilities in 2D image and video understanding. However, these models still face significant challenges in achieving holistic comprehension of complex 3D scenes. In this talk, we present our recent progress toward enabling global 3D scene understanding for MLLMs. We will cover newly developed benchmarks, evaluation protocols, and methods designed to bridge the gap between language and 3D perception.

About the Speaker

Xiongkun Linghu is a research engineer at the Beijing Institute for General Artificial Intelligence (BIGAI). His research focuses on Multimodal Large Language Models and Embodied Artificial Intelligence, with an emphasis on 3D scene understanding and grounded reasoning.
29 attendees from this group+24
Network event
150 attendees from 37 groups hosting
Fri, Jun 20, 2025, 4:00 PM UTCJune 20 - AI, ML and Computer Vision Meetup en Español
Link visible for attendees
When and Where

June 20, 2025 | 9:00 – 11:00 AM Pacific

Virtually over Zoom

IA Generativa con Agentes: Transformando el Desarrollo de Software

La charla explora cómo expandir las capacidades de los LLMs utilizando herramientas externas mediante agentes inteligentes. Veremos cómo esta combinación transforma el desarrollo de software al automatizar tareas y colaboración con la IA.

----------

Antonio Martinez es Ingeniero de Software en Inteligencia Artificial en Intel, con una maestría en Ciencias de la Computación por la Universidad Estatal de Texas. Tiene más de 10 años de experiencia en liderazgo técnico, inteligencia artificial, visión por computador y desarrollo de software.

Trabajadores Digitales: El Futuro del Trabajo Aumentado por Agentes

En esta charla te cuento cómo, junto a mi esposa, desarrollamos una plataforma de agentes basada en LangGraph y LangChain que ha escalado nuestra atención al cliente, aumentado la satisfacción y mejorado la conversión de ventas.

Te mostraré mi arquitectura agentica con el patrón React (Reasoning-Action) + Reflection (self-validation) y cómo este agente es capaz no solo de vender, sino de hacer todo el proceso de costear el delivery, validar pagos y más.

Compartiré ejemplos reales de empresas que ya usan microautomatizaciones low-code/no-code para centrar su esfuerzo en el core del negocio.

Reflexionaremos juntos sobre un mundo laboral hiperautomatizado donde cada uno de nosotros estará potenciado por múltiples agentes digitales.

-----------

Soy Jamilton Quintero, Head de Inteligencia Artificial en Apiux Tecnología, y me apasiona diseñar arquitecturas agenticas que transformen procesos reales. Soy un apasionado de la tecnología y fiel creyente de que la información tiene que fluir libremente, por lo que me encanta contribuir al OpenSource y en comunidades.

Usando Computer Vision Para Decisiones y Expresiones Artísticas en Entornos Creativos

Uso de un sistema de control gestual para la creación de animaciones / efectos visuales creativos. Exploración de cómo Machine Learning e Inteligencia Artificial pueden facilitar la creación de experiencias inmersivas para entornos de trabajo creativos. Creación de un sistema integral que comunica Python con Unreal Engine 5 para controlar entornos 3D.

----------

Tecnólogo Creativo especializado en el uso de tecnologías emergentes dentro de entornos creativos 2D/3D para diseños visuales.
Experiencia de trabajo en Realidad Virtual y Efectos Visuales para cine y TV.

Tus Datos te Están Mintiendo: Búsqueda Semántica Para Encontrar la Verdad

Los modelos de alto rendimiento comienzan con datos de alta calidad, pero encontrar muestras ruidosas, mal etiquetadas o casos límite dentro de conjuntos de datos masivos sigue siendo un gran obstáculo. En esta sesión, exploraremos un enfoque escalable para curar y refinar conjuntos de datos visuales a gran escala utilizando búsqueda semántica impulsada por embeddings basados en transformers.

Al aprovechar la búsqueda por similitud y el aprendizaje de representaciones multimodales, aprenderás a descubrir patrones ocultos, detectar inconsistencias y encontrar casos límite. También discutiremos cómo estas técnicas pueden integrarse en lagos de datos y canalizaciones a gran escala para facilitar la depuración de modelos, la optimización de conjuntos de datos y el desarrollo de modelos fundacionales más robustos en visión por computadora. Únete a nosotros para descubrir cómo la búsqueda semántica está transformando la manera en que construimos y refinamos sistemas de inteligencia artificial.

------------

Paula Ramos tiene un doctorado en Visión por Computador y Aprendizaje Automático, con más de 20 años de experiencia en el campo tecnológico. Desde principios de los años 2000 en Colombia, ha estado desarrollando tecnologías integradas de ingeniería innovadoras, principalmente en Visión por Computador, robótica y Aprendizaje Automático aplicados a la agricultura.

Durante su investigación doctoral y postdoctoral, desplegó múltiples tecnologías de computación en el borde e IoT inteligentes y de bajo costo, diseñadas para agricultores y que pueden ser operadas sin experiencia en sistemas de visión por computador. El objetivo central de la investigación de Paula ha sido desarrollar sistemas y máquinas inteligentes capaces de comprender y recrear el mundo visual que nos rodea para resolver necesidades del mundo real, como las que se presentan en la industria agrícola.
5 attendees from this group
Network event
294 attendees from 37 groups hosting
Wed, Jun 25, 2025, 4:00 PM UTCJune 25 - Visual AI in Healthcare
Link visible for attendees
Join us for the first of several virtual events focused on the latest research, datasets and models at the intersection of visual AI and healthcare.

June 25 at 9 AM Pacific

Register for the Zoom

Vision-Driven Behavior Analysis in Autism: Challenges and Opportunities

Understanding and classifying human behaviors is a long-standing goal at the intersection of computer science and behavioral science. Video-based monitoring provides a non-intrusive and scalable framework for analyzing complex behavioral patterns in real-world environments. This talk explores key challenges and emerging opportunities in AI-driven behavior analysis for individuals with autism spectrum disorder (ASD), with an emphasis on the role of computer vision in building clinically meaningful and interpretable tools.

About the Speaker

Somaieh Amraee is a postdoctoral research fellow at Northeastern University’s Institute for Experiential AI. She earned her Ph.D. in Computer Engineering and her research focuses on advancing computer vision techniques to support health and medical applications, particularly in children’s health and development.

PRISM: High-Resolution & Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion

PRISM, an explainability framework that leverages language-guided Stable Diffusion that generates high-resolution (512×512) counterfactual medical images with unprecedented precision, answering the question: “What would this patient image look like if a specific attribute is changed?” PRISM enables fine-grained control over image edits, allowing us to selectively add or remove disease-related image features as well as complex medical support devices (such as pacemakers) while preserving the rest of the image. Beyond generating high-quality images, we demonstrate that PRISM’s class counterfactuals can enhance downstream model performance by isolating disease-specific features from spurious ones — a significant advancement toward robust and trustworthy AI in healthcare.

About the Speaker

Amar Kumar is a PhD Candidate at McGill University | MILA Quebec AI Institute in the Probabilistic Vision Group (PVG). His research primarily focuses on generative AI and medical imaging, with the main objective to tackle real-world challenges like bias mitigation in deep learning models.

Building Your Medical Digital Twin — How Accurate Are LLMs Today?

We all hear about the dream of a digital twin: AI systems combining your blood tests, MRI scans, smartwatch data, and genetics to track health and plan care. But how accurate are today’s top tools like GPT-4o, Gemini, MedLLaMA, or OpenBioLLM — and what can you realistically feed them?

In this talk, we’ll explore where these models deliver, where they fall short, and what I learned testing them on my own health records.

About the Speaker

Ekaterina Kondrateva is a senior computer vision engineer with 8 years of experience in AI for healthcare, author of 20+ scientific papers, and finalist in three international MRI analysis competitions. Former head of AI research for medical imaging at HealthTech startup LightBC.

Deep Dive: Google’s MedGemma, NVIDIA’s VISTA-3D and MedSAM-2 Medical Imaging Models

In this talk, we’ll explore three medical imaging models. First, we’ll look at Google’s MedGemma open models for medical text and image comprehension, built on Gemma 3. Next,, we’ll dive into NVIDIA’s Versatile Imaging SegmenTation and Annotation (VISTA) model which combines semantic segmentation with interactivity, offering high accuracy and adaptability across diverse anatomical areas for medical imaging. Finally, we’ll explore MedSAM-2, an advanced segmentation model that utilizes Meta’s SAM 2 framework to address both 2D and 3D medical image segmentation tasks.

About the Speaker

Daniel Gural is a seasoned Machine Learning Engineer at Voxel51 with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data.
67 attendees from this group+62
Network event
234 attendees from 37 groups hosting
Thu, Jun 26, 2025, 4:00 PM UTCJune 26 - Visual AI in Healthcare
Link visible for attendees
Join us for one of the several virtual events focused on the latest research, datasets and models at the intersection of visual AI and healthcare.

When

June 26 at 9 AM Pacific

Where

Online. Register for the Zoom!

Multimodal AI for Efficient Medical Imaging Dataset Curation

We present a multimodal AI pipeline to streamline patient selection and quality assessment for radiology AI development. Our system evaluates patient clinical histories, imaging protocols, and data quality, embedding results into imaging metadata. Using FiftyOne researchers can rapidly filter and assemble high-quality cohorts in minutes instead of weeks, freeing radiologists for clinical work and accelerating AI tool development.

About the Speaker

Brandon Konkel is a Senior Machine Learning engineer at Booz Allen Hamilton with over a decade of experience developing AI solutions for medical imaging.

AI-Powered Heart Ultrasound: From Model Training to Real-Time App Deployment

We have built AI-driven tools to automate the assessment of key heart parameters from point-of-care ultrasound, including Right Atrial Pressure (RAP) and Ejection Fraction (EF). In collaboration with UCSF, we trained deep learning models on a proprietary dataset of over 15,000 labeled ultrasound studies and deployed the full pipeline in a real-time iOS app integrated with the Butterfly probe. A UCSF-led clinical trial has validated the RAP workflow, and we are actively expanding the system to support EF prediction using both A4C and PLAX views.

This talk will present our end-to-end pipeline, from dataset development and model training to mobile deployment—demonstrating how AI can enable real-time heart assessments directly at the point of care.

About the Speaker

Jeffrey Gao is a PhD candidate at Caltech, working at the intersection of machine learning and medical imaging. His research focuses on developing clinically deployable AI systems for ultrasound-based heart assessments, with an emphasis on real-time, edge-based inference and system integration.

Let’s Look Deep at Continuous Patient Monitoring

In hospitals, direct patient observation is limited–nurses spend only 37% of their shift engaged in patient care, and physicians average just 10 visits per hospital stay. LookDeep Health’s AI-driven platform enables continuous and passive monitoring of individual patients, and has been deployed “in the wild” for nearly 3 years. They recently published a study validating this system, titled “Continuous Patient Monitoring with AI”. This talk is a technical dive into said paper, focusing on the intersection of AI and real-world application.

About the Speaker

Paolo Gabriel, PhD is a senior AI engineer at LookDeep Health, where they continue to use computer vision and signal processing to augment patient care in the hospital.

AI in Healthcare: Lessons from Oncology Innovation

About the Speaker

Artificial intelligence is rapidly transforming how we diagnose, treat, and manage health.

Dr. Asba (AT) Tasneem is a healthcare data and innovation leader with over 20 years of experience at the intersection of clinical research, AI, and digital health. She has led large-scale programs in oncology and data strategy, partnering with organizations like the FDA, Duke University, and top pharma companies to drive AI-enabled healthcare solutions.
52 attendees from this group+47