Yogesh Desai
Yogesh Desai
Talk Title
GraphQL as an API Gateway for E-commerce at Scale with Tooling.
Talk Abstract
"As many companies started using GraphQL in production with different requirements, it is now important that one should know what are the best practises for production environment. Different domain has different use-cases and according to the importance of it the implementation may vary. We at Tokopedia, #1 Indonesian e-commerce have implemented GraphQL as an API Gateway. The main part is the scale of the Tokopedia in production. Due to very high scale we need to tackle a lot of problems and implement various features. This talk is mainly focussing on all of such things like Tooling needed, Observability, features like SingleFlight, Cache, Circuit Breakers, Rate limiters, Error Handling, Security Features, multi-DC/multi-cloud enability. This talk will be an in-depth discussion of all of the above things and how to implement these on a big scale for E-commerce domain.
Talk Description
Like Facebook, when they wanted to expand in Africa and Asia where network connectivity of 2G-3G is an issue; GraphQL was born. At Tokopedia, we had the same kind of problem as Indonesia is a country of 18,307 islands and network connectivity in all the regions is an issue. That’s why we started implementing GraphQL which enable us to achieve efficiency in network calls. We have GraphQL layer as an API Gateway. With the help of the GraphQL we were able to get the 60K Request Per Second in within just 2-3 months of development. We are also probably the biggest GraphQL implementation as an API Gateway in Asia region. GraphQL helped us to carry out all the events very smoothly by avoiding downtimes. Let’s go deeper by getting to know the company background and the scale at which company is operating. We had a lot of services already running in the production environment at scale. It was the year of 2017, we started to implement the GraphQL as a microservice in the organization. From a small service, GraphQL layer became one of the super-critical services at Tokopedia in a very short amount of time. We also has a very unique requirement of tooling in terms of code-generators. Let’s dive deep into each of the features. GQLTools: We have the very unique tool, we call is as GQLTools, which takes service curl calls as input and generate schema and resolvers automatically. This is probably not available in the market as it is highly specific to use-case of “GraphQL as an API Gateway for E-commerce domain”. Basically, GQLTools automate the integration process of any micro-service at GraphQL layer, we started developing and automating our GQLTools which ultimately grew and evolved with time. Currently, we have automated everything end-to-end including development, unit and integration testing. SingleFlight: Many a times it happens that a user sends multiple duplicate requests and results in unnecessary traffic on network bandwidth. It also puts an unnecessary load on the microservice. SingleFlight helps a lot in such scenarios and we implemented it at the GraphQL layer. Caching: Caching also helps a lot to reduce load on the microservice and enhance the overall speed resulting enhance user experience. We had implemented no. of caching techniques like in-memory caches and it helped us in great way in terms of performance. Caching also helps to maintain user experience for a while although service is down. At scale of Tokopedia, we had implemented a couple of caching mechanisms and tried to improve those as per our custom requirements. Circuit Breakers and rate limiters: Circuit Breakers and rate limiters help in managing the traffic scenarios and a must have to guard the backend service. If any service goes down due to some issue, Circuit Breakers comes into actions and blocks the requests going to the service. It closes back once the service is up again. Rate limiters helps by blocking any excessive errors requests coming to the service. Security: Security features like HMAC, CSRF helps to improve overall security in the environment and across all the organization. It enhances security greatly by avoiding any attacks from the outside body. Error handling is also another important aspect in terms of the handling of different scenarios and requests. Observability Platform: We required many things to track and observe as we started getting all the traffic at one place. It also helped a lot to improve performance and speed of the services and reduce latencies. The authentication came at one place reducing the load on accounts service and improving overall ecosystem. We also got a sense of traffic routes and were able to improve it in terms of performance. We have built an in-house platform similar to Optics of Apollo. It is a complete ecosystem of Dashboards, alerts to the services, actions items needed on the different events like query counts, RPS, fluctuations in the traffic. At last overall the talk will be sharing overall journey of GraphQL in Tokopedia until now.
Bengaluru, Karnataka