Check out this blog from @JonathanSpeek on his process for building an AI routing gateway with Mercury2 on Fastly compute (there’s even a video demo
).
TL;DR – Based on the nature of the end user’s query, the gateway routes to the most efficient and cost-effective model for the job. At scale, these token utilization efficiencies add up to major cost savings!
As always, we’d love to hear more about what you’re building on Fastly, especially with AI. Let us know in the comments <3