LLM Infrastructure | RAG Pipelines | Agentic AI | Content Guardrails | Observability

Udayan
Sawant.

Building the infrastructure that makes AI
work in production.

whoami
GenAI & LLM Infrastructure Engineer · Wells Fargo
cat expertise.txt
LLM fine-tuning · RAG pipelines · LangChain · Vector DBs
Azure / GCP · Feature Store · ML Platform · 6+ yrs
status
# open to GenAI Infrastructure Roles | Founding AI Engineer

I build AI that
ships to production.

I'm a Generative AI & LLM Infrastructure Engineer with 6+ years at the intersection of data engineering and AI. Right now I'm at Wells Fargo, building Feature Infrastructure and the ML Platform that powers production AI systems.

My work spans the full AI delivery stack — from RAG pipeline architecture and LLM fine-tuning, to the data infrastructure and cloud-native deployment layer that keeps models reliable at scale. I've shipped on Azure, GCP, and AWS.

I also write on Medium about distributed systems and system design patterns — because understanding how things work at the infrastructure level is what separates engineers who build toys from engineers who build things that last.

6+
years in AI & Data Eng
3
Cloud platforms
20+
Medium articles
Tokens processed

Tools of the trade.

The full stack from raw data to deployed model — every layer of the AI delivery pipeline.

🤖 GenAI / LLM
LangChain LlamaIndex HuggingFace OpenAI API RAG Pipelines Fine-tuning Prompt Eng. RLHF
🗄️ Vector & Search
Pinecone Elasticsearch Weaviate FAISS Semantic Search Embeddings
☁️ Cloud & MLOps
Azure GCP AWS Kubernetes Docker MLflow Terraform
⚙️ Data Engineering
Apache Spark Kafka Airflow Dagster dbt Databricks Snowflake
🧰 Languages & Frameworks
Python C++ SQL FastAPI PyTorch scikit-learn
📊 Observability
Arize Grafana Prometheus CloudWatch Tableau Looker

Where I've built.

Six years across fintech, streaming, and public sector — each role a layer deeper into the AI & data infrastructure stack.

Wells Fargo Software Engineer — Data, Feature Infra & ML Platform Oct 2024 → Present · San Francisco, CA
  • Designed and developed an OpenAI-compatible API suite by refactoring chunk and file management APIs to align with OpenAI standards — delivering the full Vector Store / Files / Batches API suite to support enterprise GenAI integration at scale.
  • Led API rationalisation for GenAI Data Services, implementing architectural enhancements to ingestion APIs that improved Change Data Capture tracking, embedding support, and fault tolerance; authored detailed API documentation covering trade-offs and onboarding steps.
  • Built a Content Caching feature via Google Vertex AI, reducing redundant LLM calls by caching high-frequency prompts — cutting token consumption by up to ~75% on repeated queries and lowering p99 latency from ~3.2s to ~0.4s for cache hits.
  • Implemented Rate Limiting across the Tachyon API Suite, enforcing per-client and per-endpoint quotas — reducing downstream overload incidents by ~60% and improving platform stability under concurrent GenAI workloads.
  • Enabled Arize AI Observability across WF's Tachyon Platform, instrumenting full LLM tracing, embedding drift detection, and prompt/response monitoring — giving teams end-to-end visibility into model behaviour in production for the first time.
  • Currently building Trust & Safety Guardrails for GenAI pipelines — including input/output filtering, toxicity detection, and policy enforcement layers to ensure responsible AI deployment at enterprise scale.
  • Enhanced data indexing and retrieval via Elasticsearch reindexing, alias management, and validation processes; improved MongoDB performance through schema refinement, optimised indexing, and a TTLCache / asyncache-based cache hit utility to eliminate global state.
  • Integrated Kafka audit logging and health monitoring for Tachyon API routers; contributed to resilient deployment pipelines using Jenkins, Harness, and UCD across multiple environments.
  • Conducted vulnerability scans via SonarQube, updated security profiles, and led migration to compliant Python Buildpack versions aligned with Wells Fargo TCI security standards.
  • Assessed impact and transition plan following Gemini model deprecation, identifying performance and compatibility risks for downstream GenAI systems; collaborated across teams on security triage and new API endpoints for ingestion metrics and status tracking.
J. P. Morgan Data Engineer Nov 2021 → Oct 2024 · San Francisco, CA
  • Designed an ETL strategy with PySpark and Snowflake that reduced processing time by 70% and saved $3.2M annually.
  • Led end-to-end cloud migration from on-premise data warehouses to Amazon Redshift for a petabyte-scale environment.
  • Used Redshift's MPP architecture and columnar storage for high-performance querying and analytics.
  • Built real-time data pipelines using AWS Glue and Amazon Kinesis for high-throughput financial data ingestion.
  • Led migration of FINRA's data collection infrastructure from legacy XML to Amazon DocumentDB, cutting dev cycles by 50% and achieving 50% cost savings with AWS Graviton2.
  • Established proactive monitoring with CloudWatch, ensuring regulatory compliance and system health.
  • Built strategic reporting and market trend analysis tools using Redshift Spectrum for executive decision-making.
Netflix Data Engineer Aug 2020 → Nov 2021 · Los Gatos, CA
  • Developed Metaflow workflows managing up to 20,000 concurrent tasks across Netflix's ML infrastructure.
  • Built a subscription forecasting model using ARIMA, ETS, and STL time-series methods.
  • Transitioned local prototypes to production-grade schedulers using AWS Step Functions.
  • Evaluated model performance with MAE, MSE, and RMSE; implemented cross-validation for generalisability.
Pace University Data Modeler — Graduate Research Assistant Sep 2019 → Jul 2020 · New York, NY
  • Built a Python NLP scraper to structure data on 5M+ academic publications using NLTK for abstract search.
  • Reduced application issues by 47% through systematic bug fixing and feature improvements.
UNICEF Data Analytics Associate May 2019 → Aug 2019 · New York, NY
  • Led data-driven initiatives with senior leadership across Asia Pacific and US regions.
  • Automated historical data persistence in DynamoDB using Python for strategic market analysis.

Thinking in public.

I write about distributed systems, system design patterns, and AI — with the candour that most engineering blogs avoid.

Foundations.

Certificate

Cornell University

Product Management

📍 Ithaca, New York

Master of Science

Pace University

Information Systems

📍 New York, USA

Let's build
something serious.

Whether you want to talk LLM infrastructure, RAG design, or just geek out over distributed systems trade-offs — I'm always up for it.