

What we’re about
To see all meetups in this group: https://www.meetup.com/pro/ibm-community/
This is an IBM sponsored Meetup group geared towards developers, data scientists, data engineers, and ALL Big Data, Cloud and AI enthusiasts. Our meetups provide an opportunity to work hands on with the solutions and tools in our Big Data portfolio and to interact and share knowledge with experts at IBM and in our extended community.Our Meetups typically include a 45-60 min (max) presentation that serves as an introduction and overview for a specific Big Data technology. It is followed by ~3 hours to collaborate with fellow developers and apply your Big Data skills. Depending upon the location, we can provide a cloud environment that you can run through the browser of your laptop at NO cost to you. Our meetups are FREE.
Meetup topics include:
- Hadoop-based analytics
- Open Source Hadoop, SQL on Hadoop, R on Hadoop, Integration, Governance, ...
- Real Time Analytics & Stream Computing
- Text Analytics
- Visualization and Discovery tools for Big Data
- Big Data App Development
- Big Data & Cloud
- NoSQL
- Internet of Things (IoT)
- Deep dives into the technologies that makes big data processing possible
- Anything and everything about Big Data
Join us today for a hands on software development experience.
Sponsors
See allUpcoming events (3)
See all- Network event324 attendees from 109 groups hosting[AI Alliance] Workshop: Hands-on with DoclingLink visible for attendees
Overview
When building machine learning and data applications, a significant portion of your time will be dedicated to data wrangling - from content extraction and cleaning up data. This session introduces Dockling - a robust, open source tool, designed to handle many types of document formats including PDF, DOCX, HTML and PPTX. Attendees will learn first hand how to use Docling to extract and cleanup data from various documents.Description
Docling is a versatile document processor that handles various file types, including PDF, HTML, and DOCX. It can handle complex document structures like tables, multi-column format etc. It can even extract text from scanned documents. Docling is open source and easy to use.More about docking: https://github.com/DS4SD/docling
Join us for this hands-on session to explore how to use Docling for your data needs..
In this workshop we will do the following:
- getting started with Docling
- extracting content from various documents (PDF / HTML)
- Handling table and image data
- Extracting content from scanned PDF documents using OCR (Optical Character Recognition)
What do you need to participate in this workshop?
- Comfortable in python programming language
- We will run the workshop code using Google Collab (free) - no other setup is needed!
Session Type
Hands-on workshopAudience
LLM app developers, data scientists, data engineersTechnical Level
Beginner - IntermediatePrerequisites
- Comfortable in python programming language
- We will run the workshop using Google Collab (free) - no other setup is needed!
Duration
45 minsIndustry
Cross industrySpeaker Bio
https://sujee.dev/bioAbout the AI Alliance
The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players. - Network event164 attendees from 109 groups hosting[AI Alliance] Workshop: Hands-on with Data Prep KitLink visible for attendees
Overview
When building machine learning and data applications, a significant portion of your time will be dedicated to data wrangling - from content extraction and cleaning to de-duplication and filtering out problematic data. In this hands-on session we will explore Data Prep Kit - an open source toolkit, designed to streamline these essential tasks. Attendees will learn first hand how to use the Data Prep Kit to accelerate data preparation, improve overall data quality, and enhance the efficiency of building robust LLM applications.Description
Data Prep Kit is a comprehensive Python library that democratizes and accelerates data preparation by providing out-of-the-box solutions for common tasks. Engineered to scale from a single laptop to large cloud clusters, it has been successfully used to process terabytes of data for training IBM Granite Large Language Models (LLMs).Data Prep Kit offers a robust feature set including duplicate elimination, advanced document and code handling, language detection (for both spoken and programming languages), removal of personally identifiable information (PII), as well as spam, hate speech, and malware detection.
More about Data Prep Kit : https://github.com/IBM/data-prep-kit
Join us for this hands-on session to explore how to use Data Prep Kit to accelerate data preparation, enhance data quality.
In this workshop we will do the following:
- getting started with Data Prep Kit
- Extract content from various documents (PDFs, DOCX, HTML)
- Cleanup documents by removing excess markup
- Detect and remove duplicate documents
- Detect and remove low quality and spam documents
What do you need to participate in this workshop?
- Comfortable in python programming language
- We will run the workshop code using Google Collab (free) - no other setup is needed!
Session Type
Hands-on workshopAudience
LLM app developers, data scientists, data engineersTechnical Level
Beginner - IntermediatePrerequisites
- Comfortable in python programming language
- We will run the workshop using Google Collab (free) - no other setup is needed!
Duration
60 minsIndustry
Cross industrySpeaker Bio
https://sujee.dev/bioAbout the AI Alliance
The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players. - Network event147 attendees from 109 groups hosting[AI Alliance] Workshop: Preparing High Quality Datasets with Data Prep KitLink visible for attendees
Overview
When building machine learning and data applications, a significant portion of your time will be dedicated to data wrangling - from content extraction and filtering out problematic and low quality data. In this hands-on session we will explore Data Prep Kit - an open source toolkit, designed to streamline these essential tasks. Attendees will learn first hand how to use the Data Prep Kit to improve overall data quality such as removing spam and low quality documents, removing HAP (Hate Abuse Profanity) speech, removing PII (Personally Identifiable Information) data, thus leading to higher quality dataset.Description
Join us for an interactive, hands-on session where you will learn to clean up data and prepare high quality datasets.In this workshop we will do the following:
- Extract content from various documents (PDFs, HTML)
- cleanup and remove markups
- Detect and remove SPAM content
- Score and remove low-quality documents
- Identify and remove PII data
- Detect and remove HAP (Hate Abuse Profanity) speech from documents
More about Data Prep Kit : https://github.com/IBM/data-prep-kit
What do you need to participate in this workshop?
- Comfortable in python programming language
- We will run the workshop code using Google Collab (free) - no other setup is needed!
Session Type
Workshop (hands-on)Audience
LLM app developers, data scientists, data engineersTechnical Level
Beginner - IntermediatePrerequisites
- Comfortable in python programming language
- We will run the workshop using Google Collab (free to use) - no other setup is needed!
Duration
60 minsIndustry
Cross industrySpeaker Bio
https://sujee.dev/bioAbout the AI Alliance
The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.
Past events (182)
See all- Network event491 attendees from 109 groups hosting[AI Alliance] Introducing Gneissweb: A State-Of-The-Art LLM Pre-training DatasetThis event has passed