5x'ing our recommendation speed with Embeddings

When you’re looking to categorize, search, or recommend relevant content, embeddings are incredibly powerful.

Here I use the OpenAI embeddings API to get a 5x speed increase in our template recommender feature.

Where we started

Users would provide a page title they wanted to use for their new process. We returned to the user a list of relevant example templates based on that page title.

Previously, we provided the title of a new page and all of our template titles to GPT. We then asked it to return a JSON formatted list of similar templates.

For example, with the suggested title “How to call a customer” we get back the following:

[
  { name: 'Run a discovery call', relevance: 4 },
  { name: 'Learn how to run discovery calls', relevance: 5 },
  { name: 'Learn how to handle an objection', relevance: 3 }
]

These are the names of some of our pre-built templates.

However, this process was too slow. The average benchmarked speed was ~5500 milliseconds, where the user would have to wait to receive their recommendations.

Introducing embeddings

Word embeddings reduce phrases down to 2d coordinate points. The smaller the distance between two coordinates, the more similar the phrases.

The Ultimate Guide to Word Embeddings

After pre-calculating the embeddings for each template and storing them in our database, the code to get similar templates starts to look real simple:

async function getEmbeddings ({title, itemId}, context) {
  const openAIApi = await getOpenAIApi(context);

  // Pre-calculated embeddings for the titles of each template
  const templateEmbeddings = getTemplateEmbeddings()

  // The embedding for the user suggested page title
  const titleEmbedding = await openAIApi.createEmbedding({
    model: 'text-embedding-ada-002',
    input: title,
  });

   // Calculate similarity with a geometric equation,
   // usually cosine similarity
  const embeddingsWithSimilarities = embeddings.map(...);

  return embeddingsWithSimilarities
}


The final result

Pre-calculating the embeddings means we only have to run a very quick cosine similarity check to get the similarity of templates, rather than asking GPT.

Now averaging <1000ms response times, our template recommender is a viable feature.

If you’re looking to create a feature involving recommendations, I highly suggest you take a look into Embeddings. If they fit your use case they can be a complete game-changer!