Why Synchronous WSGI Struggled Under High-Concurrency I/O
How synchronous WSGI workers became a bottleneck under high-concurrency I/O workloads, and why parts of the system were migrated from Django to Go-based services.
Problem
The first system was built as a Django-based monolithic architecture using Django + Gunicorn(WSGI)
And most CRUD features operated reliably within this structure
-
The request-response flow is straightforward, making debugging and transaction management relatively simple.Each incoming request was handled and returned by a
Gunicorn workerin a synchronous processing model -
In the first stage, development productivity is the hight priority. Django ORM and the Django Admin ecosystem enabled rapid feature development
However as the service grew, bolltenecks began to appear due to incresing:
- external API calls
- Google Login
- Authentication / Authorization workflows
- RDB(PostgreSQL) access and other I/O-heavy operations
This limitations were primarily caused by WSGI’s synchronous request-processing model
Architecture - Exchange Rate
Hackonomics Service that I made
- Users select their country in their profile, and the selected country is stored in DB
- After login, display exchange rate trends against USD based on the user’s selected country, visualized through graph over multiple time ranges(3 months, 6 months, 1 year, 2 years)
GET /api/exchange-rate/chart] --> B[Nginx] B --> C[Gunicorn WSGI Worker] C --> D[Django URL Router] D --> E[Django View] E --> F[ExchangeRateService
Business Logic Layer] %% User DB Query F --> G[Django ORM] G --> H[Generate SQL Query] H --> I[psycopg2 DB Driver] I -. TCP Socket Communication .-> J[(PostgreSQL)] J --> K[User Country Data
KR / JP / EU ...] %% External API F --> L[External Exchange Rate API] L -. HTTPS Request / Waiting .-> M[(frankfurter.app / Financial API)] M --> N[2-Year Exchange Rate JSON Data] %% Data Processing F --> O[Process Exchange Rate Data] O --> P[Generate Chart Data
3m / 6m / 1y / 2y] %% Serialization P --> Q[DRF Serializer] Q --> R[JSON Response] R --> S[Gunicorn Worker Returns Response] S --> T[Frontend Chart Rendering] %% Blocking Points I --> U["Blocking I/O
Waiting for PostgreSQL Response"] L --> V["Blocking I/O
Waiting for External API Response"] style U fill:#ffccc,stroke:#ff0000,stroke-width:2px style V fill:#ffccc,stroke:#ff0000,stroke-width:2px
Distinction
-
Django ORM : an abstraction layer that allows developers to interact with the database using Python objects instead of writing raw SQL query
-
Codebase : the overall code structure
-
Workload : the characteristics of the tasks a system is required to process
Example:
Workload Type Characteristics CPU-bound computation-intensive I/O-bound heavy waiting on I/O operations latency-sensitive response time is critical throughput-heavy high concurrent processing capacity is important
Flow
-
API request arrives from Frontend (GET /api/exchange-rate/)
This request is forwarded through
Nginx → Gunicorn → Django WSGI applicationGunicorn runs multiple WSGI worker processes and each incoming request is assigned to a specific worker (One request = One worker occupied)
Request reaches the Django WSGI application, the Authentication Middleware is executed first
- Validates JWT-based or session-based authentication
- Injects the current authenticated user (request.user) into the request context
-
Then, URL router resolves the endpoint and invoke the corresponding Django view
View is only for
HTTP-layer: request parsing, authentication validation, serializer validation, response formattingAfter verifying the authenticated user → Request to the Service Layer where the core business logic is executed
-
Serivce Layer
The service layer invokes Django ORM to retrieve the user information
User.objects.get(id=request.user.id) -> ORM QueryThis is not simply a python function call, It is
network-bound database operationDjango ORM converts the Python-based query into SQL. Then SQL is passed to psycopg2 (PostgreSQL DB Driver), which performs TCP socket-based network communication with PostgreSQL
- Send SQL → wait for DB response → receive result
During this entire process, Gunicorn worker remains occupied in a blocking state until the database response is fully received
-
Service Layer - External Exchange Rate API Call
Calls (frankfurter.app) to retrieve exchange rate data in JSON format. Once the external API response is received, the data is processed into (3nonth, 6month, 1year and 2 years) and transformed into a format compatible with Frontend chart library
-
Serializers
The python objects returned from Service Layer are serialized into JSON format using DRF seralizers
- Stable contract between Frontend and Backend
Eventually, Django view returns JSON response in the following format:
{ "3m": [...], "6m": [...], "1y": [...], "2y": [...] }Then Gunicorn Worker sends the final response back to Frontend
Production Issues
Bottlenecks started to appear during real production operation Main bottlenecks included:
- waiting for external financial API responses (frankfurter)
- PostgreSQL query waiting
- network I/O waiting
In practice, workers spent far more time in waiting states than performing actual CPU computations This exposed the limitations of the WSGI-based synchronous worker architecture
-
When many users simultaneously requested
exchange-rate charts, delayed external API responses occupied the worker pool → eventually impacted overall API response times
-
It became clear that areas involving external API communication and high-concurrency I/O workloads required an asynchronous architecture
Architecture - Login
Flow
-
Frontend initially Google Login and sends a request to Backend
The request flow is:
- Nginx → Gunicorn Worker → Django WSGI Application
-
Django URL Router resolves the
/auth/google/loginendpoint and invokes the corresponding Django ViewView was intentionally designed to remain
thinand only handleHTTP-layer, while the actual business logic was delegated to the Service Layer. -
OAuth Processing Inside the Service Layer
Inside Service layer, communicates with the Google OAuth server to exchange the
Authorization Code for an Access Tokenand retrieve user informationThe service layer then:
- checks existing users through the Django ORM
- creates new users if necessary
- stores refresh tokens
- updates the last login timestamp
-
Django ORM Query
After receiving Google user information, Service layer queries the database through the Django ORM
User.objects.get(email=google_email)This is a
Network-bound database operationinvolving communication with the PostgreSQL server. The ORM query is converted into an SQL query, and the generated SQL is transmitted to PostgreSQL through psycopg2 (PostgreSQL driver). -
New User Creation (INSERT Query)
If the user does not exist, a new user record is inserted into the database.
-
JWT / Refresh Token Processing
Generates JWT and refresh tokens for authentication and session management.
-
Serialization + Response
Finally, Django View returns a JSON response, and the Gunicorn worker sends the response back to Frontend
Problems in Production
The Google OAuth login flow was fundamentally based on synchronous blocking I/O As the production traffic increased, bottlenecks gradually began to appear during the Google OAuth login process.
The core bottlenecks were:
- Google OAuth API response waiting
- PostgreSQL query waiting
- Redis access waiting
- network I/O latency
In reality, workers spent significantly more time waiting for external API responses and database responses more than performing actual CPU computations
So, real bottleneck was much closer to:
- synchronous worker occupancy
- Worker Pool Exhaustion
During operation, if Gunicorn was running with 4 workers and 4 users simultaneously waited for Google OAuth responses, new login requests could no longer be processed immediately
The core issue was not merely request latency itself, but rather the fact that workers remained occupied during waiting states, eventually exhausting the entire worker pool
As a result:
- throughput reduction
- concurrency reduction
- increased login latency
all began occurring simultaneously
Attempt 1 - Gunicorn Worker Scaling
An initial attempt involved increasing the number of Gunicorn workers (4 → 8 workers) as horizontal scaling
However, increasing worker counts introduced additional problems:
- increased memory usage
- increased context switching overhead
- increased container/infrastructure costs
In addition, Django ORM and psycopg2 fundamentally operate synchronously
Simply introducing async views was not sufficient to solve the problem. Even when using async def, much of the existing codebase still depended on synchronous components such as:
- Django ORM
- psycopg2
- middleware
- third-party libraries
Final Decision
Eventually, the monolithic Django service was restructured into:
- Django handling core business logic
- high-concurrency I/O domains such as authentication, authorization, and OAuth processing being separated into a
Go-based service
Go’s goroutine-based concurrency model allowed the system to process large numbers of external OAuth requests simultaneously at a much lower cost, significantly reducing the worker occupancy problem