Announcing AI Accelerator to General Availability!

We are excited to announce that Fastly AI Accelerator is now generally available. AI Accelerator is a semantic caching solution for large language model (LLM) APIs used in generative AI applications. The product caches replies from LLM queries based on semantic similarity, allowing for flexible caching of query results and yielding faster response times to end users as well as cost savings. Since we launched the beta back in June, we have been capturing your feedback and adding new capabilities. Here are the details:

Tell me more about semantic caching!

So glad you asked! With AI Accelerator, we are extending Fastly’s core caching expertise into the realm of semantic caching. Semantic caching is an advanced caching technique designed to improve data retrieval efficiency by storing results of queries in a way that can intelligently satisfy future requests. In other words, semantic caching is smart - it understands a query and caches based on meaning, not just keywords. It’s also a recommended best practice from major LLM providers.

What is new with the GA release?

  • With our GA release, we now have support for more models, including Google Gemini and Microsoft Azure AI - in addition to OpenAI which we supported at beta.

  • We also added greater configurability via headers to help you control and monitor how AI Accelerator caches LLM responses. Our documentation on this can be found here.

  • Available for in-app purchase! Superusers can purchase (or cancel) in-app, with no service order and no trial necessary. If you are not currently a customer, you can self-purchase via CC after opting into the $50/mon usage plan.

Where can I see service usage metrics?

There are per customer service metrics available on the AI service page. This includes information related to requests, tokens, and time saved.

How do I get started?

Check out our guide that covers how to enable AI Accelerator. The guide also provides code examples for OpenAI, Azure OpenAI, and Google Gemini to help you configure your application to use AI Accelerator.

Let us know what you think!

As always, please leave any questions or comments below. We look forward to hearing from you!

1 Like