Agenda:
• 18:30 - Opening doors of the venue
• 19:00 - Welcome to PyBerlin! // Organisers
• 19:10 - Welcome from the host - ThoughtWorks
• 19:20 - Conquering PDFs: document understanding beyond plain text //
Ines Montani
NLP and data science could be so easy if all of our data came as clean and plain text. But in practice, a lot of it is hidden away in PDFs, Word documents, scans and other formats that have been a nightmare to work with. In this talk, I'll present a new and modular approach for building robust document understanding systems, using state-of-the-art models and the awesome Python ecosystem. I'll show you how you can go from PDFs to structured data and even build fully custom information extraction pipelines for your specific use case.
Speaker's bio:
Ines Montani is a developer specializing in tools for AI and NLP technology. She’s the co-founder and CEO of Explosion and a core developer of spaCy, a popular open-source library for Natural Language Processing in Python, and Prodigy, a modern annotation tool for creating training data for machine learning models.
• 19:50 - Short break
• 20:20 - Building EU-AI Act Compliant AI Agents for Legacy Systems // Aemal Sayer
In this talk, I’ll introduce a fully self-hosted, EU-AI Act compliant framework for building AI agents capable of operating any software system, legacy or modern, through its UI. Inspired by OpenAI’s Operator but designed for real-world compliance and flexibility, this framework combines LLMs, virtualization, and OS-level automation to let agents interact with applications as a human would: by clicking, typing, and navigating interfaces. Unlike API-bound or browser-focused tools, this solution enables true system-wide autonomy for AI agents, making integration with non-API systems not only possible, but seamless. The framework also embraces a human-in-the-loop design. When an agent encounters an issue it can’t resolve, it notifies a human operator, who can then remotely connect to the agent’s virtual environment, intervene to unblock the task, and hand control back to the agent to continue its work seamlessly.
Speaker's bio:
Aemal Sayer is a freelance AI engineer based in Berlin, Germany, with over 20 years of experience in software development and 8 years specializing in artificial intelligence. He works with small and medium-sized businesses to automate financial processes such as bookkeeping, invoicing, and compliance. His focus is on building privacy-first, self-hosted AI agents tailored to industries like e-commerce, logistics, and manufacturing. Currently, he’s building a GDPR- and EU AI Act-compliant agent that integrates with DATEV, Germany’s leading accounting platform, as a public, open-source project, aiming to drive transparency and innovation in AI-powered business automation.
• 20:50 - TBA // TBA
• 21:20 - Closing session // Organisers
This event will be only in-person. Please check our Code of Conduct and official health regulation in Berlin before coming. If you feel some signs of sickness, please consider skipping this event and attending another time. We will have plenty of events in different formats in the future.
Looking forward seeing you all soon!