Title: Principal AI Engineer - Scalabale Systems, AI.DA STC
ST Engineering Hub, SG
About the Role
We’re looking for a senior Software Engineer to help us build the next-generation agentic AI platform for computer vision. You’ll own core components of the platform that orchestrate AI agents, power automated ML workflows, and deliver robust, production-grade systems.
This role blends backend excellence, infrastructure ownership, and a collaborative engineering mindset to scale the capabilities of AI Engineers and AI Developers. You’ll work closely with AI colleagues to bring intelligent systems to life - helping them move from local experimentation to full production.
This is a hands-on, high-impact role for someone who thrives at the intersection of scalable systems and cutting-edge AI.
Key Responsibilities
Platform & Backend Development
- Design backend services (Python, FastAPI, gRPC) to support agent workflows, computer vision pipelines, and evaluation loops.
- Build scalable APIs for orchestration, task management, vector search, and model serving.
Infrastructure & Deployment
- Own CI/CD pipelines (GitHub Actions, Terraform) and production deployments.
- Develop infrastructure for memory stores, compute orchestration, and model packaging (Docker, TorchServe, BentoML).
Engineering Excellence
- Establish quality practices including testing (Pytest), monitoring, and observability (Prometheus/Grafana).
- Ensure fault-tolerant, modular, and scalable system design.
Collaboration & Leadership
- Mentor peers through code reviews, documentation, and clean architecture.
Lead system design discussions and integration with AI and platform teams.
Must-Have Skills
- 6+ years of software engineering, including 2+ in AI/ML environments.
- Proficient in Python and production-grade API development (FastAPI, Flask, gRPC).
- Experience with CI/CD and infrastructure-as-code (GitHub Actions, Terraform).
- Skilled in containerization (Docker, Kubernetes) and cloud platforms (AWS, GCP, or Azure).
- Familiarity with databases: SQL, NoSQL, and vector DBs (FAISS, Weaviate, pgvector).
- Understanding of ML lifecycles: data ingestion, inference, monitoring, and recovery.
- Proven ability to design distributed systems (API gateways, data pipelines, compute orchestration).
Bonus Skills
- Familiarity with AI agent frameworks (LangChain, AutoGen, CrewAI).
- Understanding of computer vision concepts and deployment challenges.
- Exposure to LLM APIs or GenAI integrations.
- Experience with ML observability and error logging systems
- Knowledge of front-end prototyping tools (Gradio, Streamlit, etc.).
What We Offer
- Small, agile team (5–6 engineers + interns) with autonomy and real ownership.
- Startup feel with a big company resources:
- International environment where the majority of the team and leadership is from startups or big international corporations (Lazada, Gojek, IBM) and from various countries.
- Low-bureaucracy, high-impact startup environment where your code directly supports next-gen AI deployment.
- Experimentation and self-development are in our culture
- Knowledge sharing and collaboration
- Direct collaboration with top AI researchers and computer vision scientists
- This role is on 2 years employment contract
- Hybrid work setup: ~2–3 days in office per week.