Web designing in a powerful way of just not an only professions. We have tendency to believe the idea that smart looking .

Client

Confidential

Catagories

Application Engineering

Industry

E-Commerce

Product Catalog Ingestion
For E-Commerce

We assisted a mid-sized fashion marketplace company to streamline and scale the ingestion of product catalogs from various retailers. This catalogs data is extremely important for company to generate personalized clothing recommendations for buyers.

The challenges

The client faced challenges in scaling catalog ingestion process due to growing number of retailers and increasing catalog sizes. 

➔ Keeping catalog data up-to-date for good user shopping experience.

➔ Provide raw catalog and user demographics data to the Data Science team to generate personalized product recommendations.

➔ Minimize latency and optimize resource utilization while consuming millions of catalog products.

➔ Ensure fault-tolerance and data consistency in end to end processing.

➔ Simplify the onboarding of new retailers to ease integration with marketplace.

Architecture

 

The architecture leveraged GCP’s managed services, along with open-source tools for data integration and orchestration. Below is an outline of the components:

1. Data Ingestion

GCP Composer was used to create a webhook based job to extract product catalog data from csv files provided by catalog provider(s) on a shared ftp server and dump the data in Postgresql DB. 

2. Data Streaming

Kafka served as the backbone for real-time data streaming. A GCP Composer batch job pushed Postgresql data to Kafka topics, ensuring decoupling between ingestion and processing layers.

3. Data Processing and Transformation

Data from Kafka was processed using lightweight transformation jobs implemented in Scala micro-services.

4. Data Integration

Airbyte was used to ingest data from various sources (Postgresql, ftp, etc.) to GCP BigQuery to be consumed by the data science team to generate product recommendations

5. Data search and retrieval   

ElasticSearch was used to store product catalog and recommendations by importing transformed data from BigQuery. 

6. Data Orchestration

GCP Composer coordinated the end-to-end pipeline. It:

  • Triggered Airbyte sync jobs and BigQuery transformation jobs in required order.
  • Validates data if it’s as expected. 
  • Final data export job from BigQuery to ElasticSearch. 

 

7. Data Storage

Processed data was stored in Postgresql as a source of truth, Google BigQuery for analytics and ElasticSearch for data search and consumption. Partitioning and clustering ensured optimized query performance and cost management.

8. Monitoring and Alerting

Google Cloud Operations Suite and Airbyte’s monitoring tools were used to:

  • Monitor pipeline performance.
  • Track data anomalies.
  • Send alerts for failed or delayed jobs.

Company overview

Client name: Confidential
Services: Marketplace, Personalized fashion
Technology: GCP Composer, Kafka, AirByte
Industry: E-Commerce
Location: USA

Details

The client is a consumer-facing fashion marketplace offering a personalized shopping experience.

How we Helped

Dedicated Team with Scala and Data Engineering expertise

Budget Optimisation

On time Delivery

Our Approach And The Solution

The architecture leveraged GCP’s managed services, along with open-source tools and Kafka for data ingestion and orchestration.

  • We chose Kafka to stream catalog data because it is designed to handle large volumes of data efficiently.
  • Kafka served as the backbone for real-time data streaming. A GCP Composer batch job pushed raw catalog data to Kafka topics, ensuring decoupling between ingestion and processing layers.
  • Microservices were created using Scala and Kafka streams scala client libraries.
  • Real-time transformations on the kafka topics were then applied and transformed catalog data was being persisted in Postgresql DB.
  • Airbyte was used to ingest data from various sources (Postgresql, ftp, etc.) to GCP BigQuery to be consumed by the data science team to generate product recommendations.

Pizenith Technologies It Advisor

+1 647-356-6855