Looking for a technical candidate with Agentic AI experience and strong hands-on experience with Python.
Job Description
highly skilled AI / GenAI Architect to lead the design, development, and
end-to-end operationalization of our AI infrastructure. In this role, you will
be the driving force behind our agent-based architecture, responsible not only
for building cutting-edge GenAI agents but also for managing the
"run"—ensuring robust infrastructure, seamless production maintenance, and
continuous optimization. If you excel at bridging the gap between innovative
AI research and reliable, scalable enterprise operations (LLMOps/MLOps), this
role is for you.
* 10+ years of experience in Software/System Architecture or Cloud
Infrastructure.
* 3+ years of specific, hands-on experience deploying Machine Learning or
Generative AI models into production.
* Proven track record of managing the operational "run" of live software/AI
systems.
Job Responsibilities
* Architecture & Agent Development
* System Design: Design, develop, and scale an enterprise-grade AI and
agent-based architecture.
* Agent Creation: Build, test, and deploy new autonomous and semi-autonomous
AI agents tailored to specific business use cases (e.g., using LangChain,
LlamaIndex, AutoGen, CrewAI).
* Integration: Seamlessly integrate GenAI modules, Large Language Models
(LLMs), Vector Databases, and Retrieval-Augmented Generation (RAG) pipelines
into existing enterprise applications.
* Infrastructure & "Run" Management (LLMOps)
* Infrastructure Provisioning: Provision, configure, and maintain the
underlying cloud and AI infrastructure required to support LLM inference,
vector storage, and agent orchestration.
* Production Maintenance: Take full ownership of the "run" phase of AI
implementation. Ensure production agents are highly available, scalable, and
secure.
* Strategy & Governance: Develop and implement a comprehensive strategy
for the lifecycle management, versioning, and continuous maintenance of AI
modules and agents in a live production environment.
* CI/CD Pipeline: Build and manage automated deployment pipelines for AI
models and agent updates to ensure zero-downtime deployments.
* Monitoring, Evaluation & Refinement
* Performance Tracking: Implement robust monitoring systems to track the
health, latency, and throughput of production agents.
* Quality & Safety Control: Continuously monitor outputs for model drift,
accuracy degradation, bias, and hallucinations
* Continuous Improvement: Establish feedback loops to refine prompts,
fine-tune models, update RAG knowledge bases, and improve overall agent
performance based on real-world production data.
Technical Expertise:
* AI/GenAI Ecosystem: Deep expertise in LLMs (OpenAI, Anthropic, open-source
models like LLaMA), prompting frameworks, and agentic orchestration frameworks
(LangChain, AutoGen, Semantic Kernel).
* Programming: Strong proficiency in Python and relevant backend frameworks
(FastAPI, Flask, etc.).
* Data & Storage: Experience with Vector Databases (e.g., Pinecone,
Weaviate, Milvus, Qdrant) and data pipelines for RAG setups.
* Infrastructure & Cloud: Hands-on experience provisioning and managing
infrastructure on AWS, GCP, or Azure (Terraform, Kubernetes, Docker).
* LLMOps / MLOps: Proven experience with AI/ML monitoring and evaluation tools
(e.g., LangSmith, Weights & Biases, TruLens, MLflow).
* Architecture Pattern-Event Broker Orecherstor /SupervisorPattern
* Dashboard- Grafana
Apply through whichever channel suits you best.