-
Hey all! I tried to track the number of tokens spent using the cost() method, during consecutive execution of my code and found that when cache is created, both the cost for usage with and without cached inference are same. And running the code another time (now that cache has been created) produces the cost for usage with inference is same as before and without cached inference is 0. My code: import autogen
import os
from dotenv import load_dotenv
load_dotenv()
config_list = [
{
"model": os.getenv("AZURE_OPENAI_API_DEPLOYMENT_NAME"),
"api_type": "azure",
"api_key": os.getenv("AZURE_OPENAI_API_KEY"),
"base_url": os.getenv('AZURE_OPENAI_ENDPOINT'),
"api_version": os.getenv("AZURE_OPENAI_API_VERSION"),
}
]
llm_config = {
"seed": 43,
"config_list": config_list,
"temperature": 0,
"timeout": 120,
}
user_proxy = autogen.UserProxyAgent(
name="User_Proxy",
code_execution_config={
"last_n_messages": 2,
"work_dir": "coding",
"use_docker": False
},
max_consecutive_auto_reply=10,
is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
human_input_mode="NEVER",
llm_config=llm_config
)
entra_agent = autogen.AssistantAgent(
name="entra_agent",
llm_config=llm_config,
system_message="""You are an expert Python developer specialized in working with Microsoft Entra. Your primary skills include:
..."""
)
departments = ["CS Technical", "CS Account", "CS Billing", "CS Other"]
tenant_id = os.getenv('MICROSOFT_ENTRA_TENANT_ID')
client_id = os.getenv('MICROSOFT_ENTRA_CLIENT_ID')
client_secret = os.getenv('MICROSOFT_ENTRA_CLIENT_SECRET')
task_message = f""........"""
chat_result = user_proxy.initiate_chat(entra_agent, message=task_message)
if chat_result:
cost_info = chat_result.cost
if cost_info:
print(cost_info)
yes_cache = cost_info.get("usage_including_cached_inference", 0)
no_cache = cost_info.get("usage_excluding_cached_inference", 0)
print(f"\n\nTotal Cost with cached inference: {yes_cache}")
print(f"Total Cost without cached inference: {no_cache}")
else:
print("Token usage information is not available.")
else:
print("No ChatResult object returned.") Output during the first execution:
Output during the second execution:
Does this mean the cache cuts down the token cost? If so, please let me know if it halves the cost or completely removes it. Cheers! Thanks in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
We stopped updating cost tracking and the cost is likely way off and unreliable now. Please use the latest version, which focuses on tracking token usage in model clients. |
Beta Was this translation helpful? Give feedback.
We stopped updating cost tracking and the cost is likely way off and unreliable now. Please use the latest version, which focuses on tracking token usage in model clients.