~ 5 min read

Returning Complex Data for OpenAI Function Calling

OpenAI recently announced an important feature for developers using their API - function calling. Using a new model, we are now able to return structured data from natural language giving the ability to more easily call out to APIs.

The main example that OpenAI gave calls out to a weather API. Here a function is described which returns a simple object type with two properties, location and unit. I’ve included it below for reference.

import openai

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0613",
    messages=[{"role": "user", "content": "What's the weather like in Boston?"}],
    functions=[
        {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        }
    ],
    function_call="auto",
)

When given the input “What is the weather like in boston?”, it’s able to parse the location, determine the unit isn’t required and return a JSON structure that can be used for the function call.

Returning Collections of Objects

One of the points that seems to have been overlooked from the basic example OpenAI gave of this feature is it has the ability to generate fairly complex data structures from requests.

In the following example I ask for music suggestions an array of track objects, that each are objects themselves with an artist and trackname.

import openai
import os

openai.api_key = os.getenv("OPENAI_SECRET_KEY")

example_input = "Recommend me a tracklist of 20 tracks and remixes from 2019 in the relaxing, deep house genre. Don't include tracks that are over 15 mins long."

completion = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0613",
    messages=[
        {
            "role": "user",
            "content": example_input,
        }
    ],
    functions=[
        {
            "name": "get_tracklist",
            "description": "Gets a list of tracks from a query.",
            "parameters": {
                "type": "object",
                "properties": {
                    "tracklist": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "artist": {
                                    "type": "string",
                                    "description": "The artist of the track.",
                                },
                                "title": {
                                    "type": "string",
                                    "description": "The title of the track.",
                                },
                            }
                        },
                    },
                },
                "required": ["tracklist"],
            }
        }
    ],
    function_call="auto"
)
print(completion)

Which gives us a response like this:

{
  "id": "chatcmpl-7RJeo32PyJxRUqeeWajCSOhrD4vBw",
  "object": "chat.completion",
  "created": 1686744774,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "get_tracklist",
          "arguments": "{\n  \"tracklist\": [\n    {\n      \"artist\": \"Lane 8\",\n      \"title\": \"Sunday Song\"\n    },\n    {\n      \"artist\": \"Catching Flies\",\n      \"title\": \"Satisfied\"\n    },\n    {\n      \"artist\": \"Cubicolor\",\n      \"title\": \"Points Beyond\"\n    },\n    {\n      \"artist\": \"TSHA\",\n      \"title\": \"Sacred\"\n    },\n    {\n      \"artist\": \"Bonobo\",\n      \"title\": \"Linked\"\n    },\n    {\n      \"artist\": \"Yotto\",\n      \"title\": \"Turn It Around\"\n    },\n    {\n      \"artist\": \"Oona Dahl\",\n      \"title\": \"Godtripper\"\n    },\n    {\n      \"artist\": \"Mathame\",\n      \"title\": \"Skywalking\"\n    },\n    {\n      \"artist\": \"Bicep\",\n      \"title\": \"Glue\"\n    },\n    {\n      \"artist\": \"Amtrac\",\n      \"title\": \"Never Lost\"\n    },\n    {\n      \"artist\": \"Luttrell\",\n      \"title\": \"After All\"\n    },\n    {\n      \"artist\": \"Durante\",\n      \"title\": \"Maia\"\n    },\n    {\n      \"artist\": \"Tourist\",\n      \"title\": \"Gin Under the Sink\"\n    },\n    {\n      \"artist\": \"Yotto\",\n      \"title\": \"Nova\"\n    },\n    {\n      \"artist\": \"Lane 8\",\n      \"title\": \"Visions\"\n    },\n    {\n      \"artist\": \"Christian L\u00f6ffler\",\n      \"title\": \"Ry\"\n    },\n    {\n      \"artist\": \"Catching Flies\",\n      \"title\": \"Komorebi\"\n    },\n    {\n      \"artist\": \"Bonobo\",\n      \"title\": \"Ibrik\"\n    },\n    {\n      \"artist\": \"TSHA\",\n      \"title\": \"Moon\"\n    },\n    {\n      \"artist\": \"Cubicolor\",\n      \"title\": \"No Dancers\"\n    }\n  ]\n}"
        }
      },
      "finish_reason": "function_call"
    }
  ],
  "usage": {
    "prompt_tokens": 97,
    "completion_tokens": 437,
    "total_tokens": 534
  }
}

We can use json.loads to format the “arguments” response a bit more nicely, since it’s returned as a string.

{'tracklist': [{'artist': 'Lane 8', 'title': 'Sunday Song'}, {'artist': 'Catching Flies', 'title': 'Satisfied'}, {'artist': 'Cubicolor', 'title': 'Points Beyond'}, {'artist': 'TSHA', 'title': 'Sacred'}, {'artist': 'Bonobo', 'title': 'Linked'}, {'artist': 'Yotto', 'title': 'Turn It Around'}, {'artist': 'Oona Dahl', 'title': 'Godtripper'}, {'artist': 'Mathame', 'title': 'Skywalking'}, {'artist': 'Bicep', 'title': 'Glue'}, {'artist': 'Amtrac', 'title': 'Never Lost'}, {'artist': 'Luttrell', 'title': 'After All'}, {'artist': 'Durante', 'title': 'Maia'}, {'artist': 'Tourist', 'title': 'Gin Under the Sink'}, {'artist': 'Yotto', 'title': 'Nova'}, {'artist': 'Lane 8', 'title': 'Visions'}, {'artist': 'Christian Löffler', 'title': 'Ry'}, {'artist': 'Catching Flies', 'title': 'Komorebi'}, {'artist': 'Bonobo', 'title': 'Ibrik'}, {'artist': 'TSHA', 'title': 'Moon'}, {'artist': 'Cubicolor', 'title': 'No Dancers'}]}

How this Helps

Previously, we would have needed to explain to GPT via a prompt how it should structure data in a particular way and we’d need to parse it once it’s returned. Here we’ve got a nice JSON collection we can work with straight away, no parsing necessary.

The great thing about this is that it’s very easy to add new properties to my child object. Say I also want to get details of the music label each artist is from too. I can simply add it as a property and it will now be returned. Previously this would involve creating a new prompt description and parsing the response, with function calls it is simply adding a type description.

The tradeoff to using the new model is cost, whilst OpenAI announced price reductions for input tokens, completion token prices remain the same. If you modify existing calls to use the new function call API it is possible that the amount of completion tokens you’ll use will actually increase. In my own tracklist example here after making the changes to use function calling the number of completion tokens approximately doubled.

Subscribe for Exclusives

My monthly newsletter shares exclusive articles you won't find elsewhere, tools and code. No spam, unsubscribe any time.