Senior Manager, Software Engineering (Distributed Systems + Machine Learning)

Salesforce

Salesforce

Software Engineering, Data Science
Hyderabad, Telangana, India
Posted on Feb 11, 2026

To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts.

Job Category

Software Engineering

Job Details

About Salesforce

Salesforce is the #1 AI CRM, where humans with agents drive customer success together. Here, ambition meets action. Tech meets trust. And innovation isn’t a buzzword — it’s a way of life. The world of work as we know it is changing and we're looking for Trailblazers who are passionate about bettering business and the world through AI, driving innovation, and keeping Salesforce's core values at the heart of it all.

Ready to level-up your career at the company leading workforce transformation in the agentic era? You’re in the right place! Agentforce is the future of AI, and you are the future of Salesforce.

The Salesforce Monitoring Cloud team is looking for an engineering leader with deep expertise in AI/ML and analytical modeling for infrastructure systems to lead high caliber engineering teams building the next generation of intelligent availability monitoring platforms at massive scale.

Monitoring Cloud is a foundational part of Salesforce Infrastructure that ensures the reliability and availability of Salesforce products globally. We own the entire telemetry stack — from lightweight agents that emit metrics, logs, traces and events, to large-scale distributed backend systems that process petabytes of telemetry data in real time across public cloud environments.

We are building advanced machine learning–driven detection and analytical systems that power proactive incident identification, anomaly detection, capacity forecasting, failure prediction and automated remediation across distributed cloud infrastructure. This role focuses on deep applied ML and statistical modeling for infrastructure observability — not generative AI or enterprise application ML.

As an engineering leader, you will drive the architecture and implementation of scalable ML systems embedded directly into our monitoring cloud. You will lead teams building analytical pipelines, detection models, signal correlation engines and intelligent automation systems that improve availability, reduce MTTR and enhance system resilience. You are equally passionate about technical depth, operational excellence, and building high-performing teams.

Responsibilities

  • Define and drive the vision for ML-powered infrastructure observability with a focus on Availability, Reliability, Detection Accuracy and Operational Excellence.

  • Architect and scale analytical modeling systems for:

    • Real-time anomaly detection across metrics, logs and traces

    • Signal correlation and root cause analysis

    • Failure prediction and risk scoring

    • Capacity forecasting and saturation prediction

    • Intelligent alert noise reduction

  • Build ML pipelines that operate at hyperscale across distributed systems in Azure and other public cloud environments.

  • Lead the development of statistical, time-series and deep learning models tailored for infrastructure telemetry data.

  • Integrate ML models directly into monitoring and incident management workflows.

  • Ensure models are production-grade: reliable, explainable, scalable and cost-efficient.

  • Drive execution in partnership with infrastructure engineering, product and architecture teams.

  • Establish strong service ownership practices including SLOs, SLAs and operational metrics for ML-powered services.

  • Build and mentor a high-caliber team of ML engineers and distributed systems engineers.

  • Promote rigorous experimentation, model evaluation frameworks and data-driven decision making.

  • Recruit top talent in ML systems and infrastructure engineering.

Required Skills / Experience

  • 12+ years of experience in software development with 3+ years managing engineering teams.

  • Strong background in Machine Learning applied to large-scale systems or infrastructure problems.

  • Large-scale data analytics

  • Experience productionizing ML models in distributed cloud environments.

  • Strong foundation in Distributed Systems architecture.

  • Experience building or operating observability or telemetry platforms at scale.

  • Experience with public cloud platforms such as Azure, AWS or GCP.

  • Experience with large-scale data processing frameworks (e.g., Kafka, Spark, stream processing systems, NoSQL stores).

  • Strong service ownership mindset with experience defining and operating services with SLOs/SLAs.

  • Ability to balance research-oriented thinking with practical production delivery.

  • Proven track record of recruiting and developing high-performing technical teams.

  • Excellent written and verbal communication skills.

Preferred Qualifications

  • Experience building ML-driven monitoring or availability platforms.

  • Background in infrastructure reliability engineering or SRE environments.

  • Experience designing low-latency ML inference systems.

  • Contributions to research, patents, or technical publications in applied ML or distributed systems.

Unleash Your Potential

When you join Salesforce, you’ll be limitless in all areas of your life. Our benefits and resources support you to find balance and be your best, and our AI agents accelerate your impact so you can do your best. Together, we’ll bring the power of Agentforce to organizations of all sizes and deliver amazing experiences that customers love. Apply today to not only shape the future — but to redefine what’s possible — for yourself, for AI, and the world.

Accommodations

If you require assistance due to a disability applying for open positions please submit a request via this Accommodations Request Form.

Posting Statement

Salesforce is an equal opportunity employer and maintains a policy of non-discrimination with all employees and applicants for employment. What does that mean exactly? It means that at Salesforce, we believe in equality for all. And we believe we can lead the path to equality in part by creating a workplace that’s inclusive, and free from discrimination. Know your rights: workplace discrimination is illegal. Any employee or potential employee will be assessed on the basis of merit, competence and qualifications – without regard to race, religion, color, national origin, sex, sexual orientation, gender expression or identity, transgender status, age, disability, veteran or marital status, political viewpoint, or other classifications protected by law. This policy applies to current and prospective employees, no matter where they are in their Salesforce employment journey. It also applies to recruiting, hiring, job assignment, compensation, promotion, benefits, training, assessment of job performance, discipline, termination, and everything in between. Recruiting, hiring, and promotion decisions at Salesforce are fair and based on merit. The same goes for compensation, benefits, promotions, transfers, reduction in workforce, recall, training, and education.