What you'll do
- Design and ship agentic systems (tool calling, multi-agent workflows, structured outputs) that reliably fetch, extract, and normalize data across the web and APIs.
- Build and operate search / indexing pipelines on OpenSearch / Elasticsearch (schema design, analyzers, reindex / data migration strategies, relevance tuning).
- Own robust web scraping : directory crawling, CAPTCHA handling, headless browsers, rotating proxies, anti-bot evasion, and backoff / retry policies.
- Develop backend services in Python + FastAPI with clean contracts and strong observability.
- Scale workloads on AWS + Docker (batch / queue workers, autoscaling, fault tolerance, cost control).
- Parallelize external API requests safely (rate limits, idempotency, circuit breakers, retries, dedupe).
- Integrate third-party APIs for enrichment and search; model and cache responses; manage schema evolution.
- Transform and analyze data using Pandas (or similar) for normalization, QA, and reporting.
- Pitch in across the stack : billing (Stripe), and occasional front-end changes to ship end-to-end features.
Minimum requirements
Hands-on experience with agentic architectures (tool calling, structured outputs / JSON, planning / execution loops) and prompt engineering.Deep knowledge of OpenSearch / Elasticsearchindex design, analyzers, ingestion pipelines, snapshots, rolling upgrades, and
zero-downtime reindexing / data migrations
Proven web scraping expertise : solving CAPTCHAs, session / auth flows, proxy rotation, stealth techniques, and legal / ethical constraints.AWS + Docker in production (at least two of : ECS / EKS, Lambda, SQS / SNS, Batch, Step Functions, CloudWatch).Building high-throughput data / IO pipelines with concurrency (asyncio / multiprocessing), resilient retries, and rate-limit aware scheduling.Integrating diverse external APIs (auth patterns, pagination, webhooks); designing stable interfaces and backfills.Strong data wrangling with Pandas or equivalent; comfort with large CSV / Parquet workflows and memory / perf tuning.Familiarity with Stripe (subscriptions, metered billing, webhooks) and basic front-end changes (React / TypeScript or similar).Excellent ownership, product sense, and pragmatic debugging.Nice to have
Entity resolution / record linkage at scale (probabilistic matching, blocking, deduping).Experience with Langfuse , OpenTelemetry, or similar for tracing / evals; task queues (Celery / RQ), Redis, Postgres.Search relevance (BM25 / vector / hybrid), embeddings, and retrieval pipelines.Playwright / Selenium, stealth browsers, anti-bot frameworks, CAPTCHA providers.CI / CD, infrastructure as code (Terraform), and cost / perf observability.Security & compliance basics for data handling and PII.J-18808-Ljbffr