Announcing AI Accelerator to General Availability!

tgretto · December 12, 2024, 7:51pm

We are excited to announce that Fastly AI Accelerator is now generally available. AI Accelerator is a semantic caching solution for large language model (LLM) APIs used in generative AI applications. The product caches replies from LLM queries based on semantic similarity, allowing for flexible caching of query results and yielding faster response times to end users as well as cost savings. Since we launched the beta back in June, we have been capturing your feedback and adding new capabilities. Here are the details:

Tell me more about semantic caching!

So glad you asked! With AI Accelerator, we are extending Fastly’s core caching expertise into the realm of semantic caching. Semantic caching is an advanced caching technique designed to improve data retrieval efficiency by storing results of queries in a way that can intelligently satisfy future requests. In other words, semantic caching is smart - it understands a query and caches based on meaning, not just keywords. It’s also a recommended best practice from major LLM providers.

What is new with the GA release?

With our GA release, we now have support for more models, including Google Gemini and Microsoft Azure AI - in addition to OpenAI which we supported at beta.
We also added greater configurability via headers to help you control and monitor how AI Accelerator caches LLM responses. Our documentation on this can be found here.
Available for in-app purchase! Superusers can purchase (or cancel) in-app, with no service order and no trial necessary. If you are not currently a customer, you can self-purchase via CC after opting into the $50/mon usage plan.

Where can I see service usage metrics?

There are per customer service metrics available on the AI service page. This includes information related to requests, tokens, and time saved.

How do I get started?

Check out our guide that covers how to enable AI Accelerator. The guide also provides code examples for OpenAI, Azure OpenAI, and Google Gemini to help you configure your application to use AI Accelerator.

Let us know what you think!

As always, please leave any questions or comments below. We look forward to hearing from you!

Topic		Replies	Views
Announcing the AI Accelerator Beta! AI	0	648	June 13, 2024
About the AI category AI	1	173	November 18, 2024
Using AI Accelerator without Fastly-Key AI	1	71	November 18, 2024
Caching content with Fastly's new HTTP Cache API Compute api	0	98	October 18, 2024
Inviting inputs on a new Compute@Edge Cache API Compute	7	1127	June 26, 2023

Announcing AI Accelerator to General Availability!

Related topics