* Salário: R$ 11.000 a R$ 20.000 por mês (estimado)
* O valor exibido é uma estimativa calculada com base em dados públicos e referências do mercado. Não garantimos que este seja o salário oferecido para esta vaga específica.
Área: Tecnologia da Informação
Nível: Senior
What makes us Confidencial (Apenas para Cadastrados)?
A Gartner® Magic Quadrant™ Leader for 15 years in a row, Confidencial (Apenas para Cadastrados) transforms complex data landscapes into actionable insights, driving strategic business outcomes. Serving over 40,000 global customers, our portfolio leverages pervasive data quality and advanced AI/ML capabilities that lead to better decisions, faster.
We excel in integration and governance solutions that work with diverse data sources, and our real-time analytics uncover hidden patterns, empowering teams to address complex challenges and seize new opportunities.
The Senior Data Engineer – AI Infrastructure & Data Pipelines
Confidencial (Apenas para Cadastrados) is seeking a high-performance Data Engineering builder to serve as a key execution partner in constructing our 2026 internal AI infrastructure. This role is dedicated to moving beyond traditional maintenance into foundational, creative engineering. You will be responsible for the end-to-end "plumbing" of our model orchestration layers and the migration of our enterprise data estate to an open, interoperable Lakehouse architecture.
As a "Customer Zero" engineer, you will use Confidencial (Apenas para Cadastrados)’s own Talend Cloud and Open Lakehouse products to build resilient, enterprise-grade AI systems. You will work in a small, elite team where high-speed innovation and a "fail-fast" mentality are celebrated. This position requires a "technically curious" mind, ready to tackle the complexities of Apache Iceberg, the Model Context Protocol (MCP), and Knowledge Graph data flows for RAG.
What makes this role interesting?
This role sits at the intersection of data engineering, AI infrastructure, and open data platforms, giving you the opportunity to design systems that directly enable advanced AI workflows across the organization.
- Build a modern open Lakehouse architecture: Lead the migration of enterprise data including Salesforce and other core enterprise applications into an Apache Iceberg Lakehouse, implementing Lake Landing patterns, scalable ingestion strategies using CDC and incremental pipelines, schema governance, and the delivery of AI-ready curated datasets.
- Design infrastructure connecting enterprise data and AI systems: Build, host, and secure Model Context Protocol (MCP) servers on Confidencial (Apenas para Cadastrados)’s SaaS platform, creating a robust bridge between internal data products and third-party LLM agents.
- Engineer high-performance RAG data pipelines: Develop and optimize pipelines supporting vector databases and retrieval systems, implementing semantic chunking, embedding generation, and CDC-driven embedding updates to ensure data freshness and accuracy for AI workflows.
- Optimize performance and cost efficiency at scale: Leverage advanced Iceberg capabilities including the Adaptive Iceberg Optimizer to automate table compactions and metadata cleanup, improving query performance while reducing compute consumption.
- Build secure and governed AI data systems: Integrate security-by-design principles into agentic AI workflows, protecting enterprise data and mitigating risks such as confused deputy attacks and unauthorized data access.
- Work on cutting-edge AI data architectures: Contribute to the design of knowledge graph pipelines, vector indexing systems, and context-aware data flows that power autonomous AI agents and multi-step agentic AI workflows.
- Collaborate with global AI engineering teams: Partner closely with the Principal Data Engineer and collaborate with engineering teams across the United States, India, and other global hubs to align local engineering execution with Confidencial (Apenas para Cadastrados)’s global AI strategy.
Here’s how you’ll be making an impact:
Your work will directly influence how AI systems access, process, and utilize enterprise data across Confidencial (Apenas para Cadastrados)’s global platform.
- Accelerating enterprise AI adoption: Deliver robust pipelines and infrastructure that enable teams to deploy RAG-powered applications and agentic AI workflows with reliable, context-aware data.
- Migrating enterprise data to a scalable Lakehouse platform: Drive the transition of Salesforce and other enterprise application data into the Apache Iceberg Lakehouse, ensuring scalable ingestion, schema governance, and high-quality curated datasets.
- Improving data platform performance and efficiency: Optimize ingestion pipelines, compute usage, and lakehouse query performance, improving operational efficiency and reducing infrastructure costs.
- Delivering AI-ready datasets at scale: Prepare datasets with semantic chunking, embeddings, and freshness guarantees, enabling reliable LLM and RAG workflows across the organization.
- Maintaining strong data governance and quality: Ensure data accuracy, completeness, schema compliance, and governance across lakehouse tables, maintaining trust in enterprise data systems.
- Supporting global AI teams: Deliver data pipelines on time and ensure data usability for AI, analytics, and platform teams across multiple regions and time zones.
- Strengthening security and compliance across AI data systems: Apply enterprise security best practices including row-level and column-level controls, PII protection, and enterprise data governance frameworks.
We’re looking for a teammate with:
Required
- Iceberg Mastery: Deep experience with Apache Iceberg v2, manifest management, hidden partitioning, and schema evolution.
- Cloud Data Platforms: Advanced proficiency in Snowflake (External Volumes, Open Catalog), Amazon S3, and AWS EC2.
- Programming: Mastery of Python (FastMCP, PySpark) and SQL optimization.
- Enterprise Data Integration: Migrating data from Salesforce and other enterprise applications using APIs, bulk exports, and CDC streams.
- Infrastructure: Proficiency with Docker, Kubernetes, and Helm for deploying scalable, containerized MCP servers.
- Data Modeling: Expertise in semantic modeling for LLMs, including Knowledge Graph construction (Neo4j) and vector indexing.
- Data Modeling & Governance: Schema evolution, metadata management, and maintaining data consistency and quality.
- Security & Compliance: Knowledge of row-level and column-level security, PII/CRM data protection, and enterprise data governance best practices.
- Language: Expert-level English (Native or Professional) for global collaboration across time zones.
The location for this role is:
Office Location, São Paulo, Brazil
Remote: #Remote; Hybrid: #LI-Hybrid,
Apply now and help change how the world transforms complex data landscapes into actionable insights and turns complex data challenges into new opportunities!
More about Confidencial (Apenas para Cadastrados) and who we are:
Find out more about ‘Life at Confidencial (Apenas para Cadastrados)’ on social: Instagram, LinkedIn, YouTube, and X/Twitter, and to see all other opportunities to join us and our values, check out our Careers Page.
What else do we offer?
- Genuine career progression pathways and mentoring programs.
- Culture of innovation, technology, collaboration, and openness.
- Flexible, diverse, and international work environment.
Giving back is a huge part of our culture. Alongside an extra “change the world” day plus another for personal development, we also highly encourage participation in our Corporate Responsibility Employee Programs
If you need assistance applying for a role due to a disability, please submit your request via email to accessibilityta@Confidencial (Apenas para Cadastrados).com. Any information you provide will be treated according to Confidencial (Apenas para Cadastrados)’s Recruitment Privacy Notice. Confidencial (Apenas para Cadastrados) may only respond to emails related to accommodation requests.
Confidencial (Apenas para Cadastrados) is not accepting unsolicited assistance from search firms for this employment opportunity. Please, no phone calls or emails. All resumes submitted by search firms to any employee at Confidencial (Apenas para Cadastrados) via-email, the Internet or in any form and/or method without a valid written search agreement in place for this position will be deemed the sole property of Confidencial (Apenas para Cadastrados). No fee will be paid in the event the candidate is hired by Confidencial (Apenas para Cadastrados) as a result of the referral or through other means.
Work Location: Hybrid remote in Vila Olímpia, SP
