Skip to content

30 · API and Service Communication Selection

The thesis in one line: API selection is not a format choice between REST and gRPC. It is a boundary choice. Sync or async, internal or external, strict contract or flexible query, one response or continuous stream: these decide how services should communicate.


🧰 Technology Stack Selection Track, Chapter 4 · One thing to practice

Chapter 04 covered layered, microservice, and event-driven patterns. Chapter 29 covered async middle layers. Now we focus on service boundaries: once two systems talk, you must choose communication style, contract, versioning, failure handling, and permission boundary.


Opening: communication style determines coupling style

The same "order tells inventory to deduct" can be done as:

   A. Order calls inventory REST API synchronously
   B. Order calls inventory gRPC synchronously
   C. Order publishes OrderCreated; inventory consumes asynchronously
   D. Inventory exposes GraphQL; order queries as needed
   E. Inventory calls order via Webhook

All can work. They couple differently:

  • Synchronous calls give clear results, but the caller is slowed or broken by the callee.
  • Async events decouple, but results are not immediate and consistency is harder.
  • GraphQL gives clients flexibility, but server governance and performance get harder.
  • Webhooks fit external notifications, but require retry, signature, and idempotency.

Architectural judgment: choose interaction semantics before protocol. Do not start with "we use gRPC." Start with "does this path need to know the result now?"


1. First cut: sync or async

StyleGood forCost
Synchronous request/responseUser waits for result, immediate validation, direct failure feedbackLonger chains, P99 tail latency adds up, failures propagate
Async message/eventCan finish later, smooth spikes, multiple downstream consumersState progression, idempotency, compensation, backlog
StreamingContinuous output, realtime state, long task progressConnection management, backpressure, reconnect recovery

Rule:

   User must know whether to continue now -> sync
   User only needs "accepted"             -> async
   User needs continuous updates          -> streaming

In StarArena, entry/admission needs sync; ticket notification can be async; queue position updates fit streaming or polling.


2. REST, gRPC, and GraphQL do not replace each other

StyleBetter fitPoor fit
RESTExternal APIs, normal Web/SaaS, easy debugging, common ecosystemVery high-frequency internal calls needing strict types
gRPCInternal service calls, low latency, high throughput, strict IDLBrowser-first public APIs, low-friction partner APIs
GraphQLMulti-client aggregation, fast-changing fields, frontend flexibilityComplex writes, weak cache/permission/limit governance
WebhookThird-party event notification, payment callbacks, external integrationCore sync path that needs immediate result
MCPExposing tools/resources/context to AI AgentsNormal business service calls with no Agent semantics

Boundary matters:

  • External APIs should be understandable, stable, and versionable.
  • Internal hot paths can prefer strong contracts and performance.
  • Frontend aggregation can use GraphQL if governance exists.
  • Webhooks need signature, idempotency, replay protection.
  • Agent tool APIs need permissions and human approval in the boundary (Chapter 23).

3. Contract matters more than protocol

API failures often come from unclear contracts:

Contract pointMust specify
Input/outputField meaning, required/optional, units, enums
Error semanticsRetryable, user error, system error
IdempotencyWhat happens if the same request repeats?
VersioningHow fields are added, deprecated, removed
Rate limitsWho can call how much? What happens when exceeded?
Security boundaryAuthn, authz, signatures, audit

Without contract governance, REST becomes random URLs, GraphQL becomes arbitrary database exposure, and gRPC becomes a strongly typed ball of mud.


4. Internal calls: prevent unbounded call chains

Microservice performance often fails through fan-out:

   user request
      \- A
         |-- B
         |   |-- D
         |   \-- E
         \-- C
             |-- F
             \-- G

Each hop adds:

  • Network latency.
  • Timeout and retry storms.
  • Dependency failure propagation.
  • Trace/debugging cost.

Internal APIs need:

  1. Timeout budgets: if upstream has 500ms, every downstream cannot also get 500ms.
  2. Retry discipline: retry only idempotent operations, with backoff and jitter.
  3. Degradation: when non-critical dependency fails, return partial or fallback result.

This is Chapter 12 resilience engineering.


5. External APIs: stability beats elegance

Once an external API is published, it is a promise:

  • Backward compatibility: adding fields is usually safe; removing or changing meaning is dangerous.
  • Stable errors: customers write code against error semantics.
  • Docs and examples: if external developers cannot understand it, elegance does not help.
  • Signatures and replay protection: critical for payment, webhooks, Agent tools.
  • Audit and rate limits: you need traceability and abuse control.

For platform products, API is part of the product. Versioning and compatibility are architecture boundaries.


🎯 Quick check

🤔Before choosing REST, gRPC, GraphQL, or Webhook, what should you judge first?
  • AWhich name is most popular
  • BWhether this interaction is sync or async, internal or external, strict contract or flexible query, and how failure recovers
  • CWhich one uses the least code
🤔A third-party payment provider needs to tell your system that a user has paid. What is the common reasonable style?
  • AKeep the order request blocked until payment completes
  • BReceive a signed Webhook, verify it, then idempotently advance the payment/order state machine
  • CTrust the frontend when it says payment succeeded

Chapter summary

  • Communication style determines coupling: sync, async, and streaming fit different semantics.
  • REST, gRPC, GraphQL, Webhook, and MCP have different boundaries.
  • Contract matters more than protocol: fields, errors, idempotency, versions, limits, security.
  • Internal calls need fan-out control: timeout budget, retry discipline, degradation, trace.
  • External API is a product promise: compatibility, docs, errors, signatures, audit.

Next: we have chosen service implementation, storage, middle layers, and communication. Chapter 31 asks where these things run, who scales them, who releases them, and who rescues them.


💬 Comments