add audio utils to handle model audio input #7850

pretbc · 2025-02-25T08:17:18Z

[Feature] DSPy Audio/Video Support Tracking #7847

followed:

gemini implementation from LiteLLM

@isaacbmiller

pretbc · 2025-02-25T15:34:27Z

let me check pytests

pretbc · 2025-02-27T09:15:20Z

Did locally - ruff check . --fix-only

isaacbmiller · 2025-02-27T14:26:06Z

will try to review this weekend. Possible for you to make a mini tutorial or demo just to show that it works?

pretbc · 2025-02-27T15:07:54Z

I will suggest as demo below snippet

lm = dspy.LM(
    "gemini-2.0-flash-exp", api_key=os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
)
dspy.configure(lm=lm)

audio_path = "path/to/file.wav"


audio_data = pathlib.Path(audio_path).read_bytes()
audio_data_base64 = base64.b64encode(audio_data).decode("utf-8")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Analyze audio. Choose sentiment from: 'excited', 'neutral', 'confused', 'frustrated', 'happy', 'sad', 'angry'"},
            {
                "type": "image_url",
                "image_url": "data:audio/wav;base64,{}".format(
                    audio_data_base64
                ),
            },
        ],
    }
]


class Classify(dspy.Signature):
    """Classify sentiment of a given sentence."""

    audio: dspy.Audio = dspy.InputField()
    sentiment: Literal['excited', 'neutral', 'confused', 'frustrated', 'happy', 'sad', 'angry'] = dspy.OutputField()
    text: str = dspy.InputField(desc='Task description')

classify = dspy.Predict(Classify)


print(f'Call directly LLM: {lm(messages=messages)}')
print(f'Call DSPy signature: {classify(audio=dspy.Audio.from_file(audio_data), text='Analyze audio')}')

output

Call directly LLM: ['Confused.']
Call DSPy signature: Prediction(
    sentiment='confused'
)

underneath the system sends for signature

[{"role": "system", "content": "Your input fields are:\n1. `audio` (Audio)\n2. `text` (str): Task description\n\nYour output fields are:\n1. `sentiment` (Literal['excited', 'neutral', 'confused', 'frustrated', 'happy', 'sad', 'angry'])\n\nAll interactions will be structured in the following way, with the appropriate values filled in.\n\n[[ ## audio ## ]]\n{audio}\n\n[[ ## text ## ]]\n{text}\n\n[[ ## sentiment ## ]]\n{sentiment}        # note: the value you produce must exactly match (no extra characters) one of: excited; neutral; confused; frustrated; happy; sad; angry\n\n[[ ## completed ## ]]\n\nIn adhering to this structure, your objective is: \n        Classify sentiment of a given sentence."}, {"role": "user", "content": [{"type": "text", "text": "[[ ## audio ## ]]"}, {"type": "image_url", "image_url": {"url": "data:audio/wav;base64,UklGRpLICABXQVZFZDOXj5XDk="}}, {"type": "text", "text": "[[ ## text ## ]]\nAnalyze audio\n\nRespond with the corresponding output fields, starting with the field `[[ ## sentiment ## ]]` (must be formatted as a valid Python Literal['excited', 'neutral', 'confused', 'frustrated', 'happy', 'sad', 'angry']), and then ending with the marker for `[[ ## completed ## ]]`."}]}]

pretbc · 2025-03-03T10:00:08Z

gonna check any conflict after review

Damian Fastowiec added 5 commits March 3, 2025 13:16

add audio utils to handle model audio input

7e1df5a

add audio utils to handle model audio input

6287fb5

fix test adapters image andaudio

7f0caf0

Fix Ruff linting issues

36f3e84

Fix conflicts move audio to adapters types

0567ab7

pretbc force-pushed the feature/audio_utils branch from 37a8a61 to 0567ab7 Compare March 3, 2025 12:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add audio utils to handle model audio input #7850

add audio utils to handle model audio input #7850

pretbc commented Feb 25, 2025 •

edited

Loading

pretbc commented Feb 25, 2025

pretbc commented Feb 27, 2025

isaacbmiller commented Feb 27, 2025

pretbc commented Feb 27, 2025 •

edited

Loading

pretbc commented Mar 3, 2025

add audio utils to handle model audio input #7850

Are you sure you want to change the base?

add audio utils to handle model audio input #7850

Conversation

pretbc commented Feb 25, 2025 • edited Loading

pretbc commented Feb 25, 2025

pretbc commented Feb 27, 2025

isaacbmiller commented Feb 27, 2025

pretbc commented Feb 27, 2025 • edited Loading

pretbc commented Mar 3, 2025

pretbc commented Feb 25, 2025 •

edited

Loading

pretbc commented Feb 27, 2025 •

edited

Loading