-
Drop us a line
info@elvt.io
-
Give us a call
(202) 945-4833
-
USA Based
Mon - Fri
So you’ve decided to make an LLM a key part of your daily workflow. Welcome to the world of AI! Let’s get to brass tacks and make sure that you’re following the best practices of prompting the AI agent to achieve optimal results for your business and individual workflow needs.
I will cover a few essential points to get you efficiently started on the path towards success:
Let’s start by defining how an LLM or AI agent like OpenAI’s ChatGPT or Anthropic’s Claude works under the hood in plain English. At its most basic level, these tools are predictive text engines that calculate the probability of the next component of language that should come after what it’s been prompted with. This can be a single character, a single word, or a multiparagraph essay on thermodynamics. It all comes down to how effectively you prompt the agent. If you have used a modern smartphone keyboard, it is likely that you have already seen this in action. When you type a message and see suggestions above the keyboard for the next word in your text message, this is predictive text at work. If you type into ChatGPT, “One, two, three”, it will most likely answer with “Four, five, six”, but current models may also ask for clarification on guardrails for the desired answer before directly responding to the prompt.
With this in mind, it makes sense that the more detailed and specific your prompt is, the better the answer will be for your specific use case. The more you give guardrails and define success for the LLM, the more it will increase the likelihood of achieving the result you are looking for.
In the counting scenario, that would mean giving the agent specific guidance on the output:
I want you to read the following phrase and only respond with the one word you think is most likely to follow.
“One, two, three”
With the guardrails provided, the answer will almost always be just “Four” with current models, rather than a longer and more unpredictable answer. This is the power of guardrails in prompting. Use this to your advantage in your workflow, particularly in client-facing contexts.
There are several key format conventions you can follow to ensure a quality result. One is to use XML tags to separate out different aspects of your prompt. For example:
<Role> Scientific researcher </Role> <Temperature> Low </Temperature> <Task> Analyze the attached document and summarize the three main arguments made by the author. Immediately end the response once this is complete. </Task>
By separating out the different aspects of the prompt into tagged sections, the LLM will be able to consume the prompt with clearer definitions of its task, the role with which it is supposed to perform the task, and the aggressiveness with which it should employ creativity in answering (temperature). By using tags, it is also going to use less tokens for the request than it normally would in analyzing a plain language request. This means as you scale your business needs or individual SME workflows for growing customer requests, your costs for leveraging the LLM will scale in a more manageable way, particularly in the custom API use case.
You can also specify the maximum number of tokens that the LLM should use for this request to better control the efficiency of the prompt in returning scalable results.
Other key formatting considerations to use:
This will ensure the agent knows exactly what the action focus is – analyze, write, create, generate, explain, list, etc.
Include links, attach documents or formats of what you are looking for in the result.
Professional, personal, somewhere in between.
A criminal detective, a financial auditor, a banker, a lawyer, etc.
A lower temperature will introduce a near-zero level of randomness and creativity, whereas a higher temperature will encourage the opposite.
Once you have viable prompts for your daily business needs and the only variables are user parameters, you will need to keep a very close touch on the changes that are made to your prompts. This can be achieved by incorporating a prompt management platform into your team’s toolset. Good examples of this are Pezzo, Promptlayer, or Agenta. By adding one of these tools, your prompts can be versioned in a granular way over time, and changes can be reverted quickly as needed, This will also enable increased internal transparency into the success rate of your prompts by adding internal metrics that directly correlate prompt success with specific prompt templates, enabling A/B testing and long term growth visibility. Adding visibility to the often opaque nature of AI tooling is key to maintaining investment from your business’ stakeholders – from majority investors with capital at stake, to C suite looking for maximized growth, and to management checking in on daily success.
While utilizing the AI agent independently is a powerful tool on its own, you will reach a point in your scale where you may need to consider building your own tool that interfaces with the LLM directly to more efficiently resolve customer requests and improve your margins.
Before you introduce an AI-leveraging feature to your business workflow, you will need to spend time doing due diligence on the cost structure of the software. For some models, the cost proposition per API call will not be feasible at scale. This paradigm is quickly changing as the training cost for models decreases over time (see: DeepSeek v3/r1) but there will continue to be significant infrastructure costs associated with implementing the feature. Be sure to include:
Existing active user counts in estimating API calls (include overhead for future scale, generally up to 100%)
Additional infrastructure cost
Additional staff cost of engineers with capacity to maintain (whether on staff or not)
Implementation cost – may have an impact cost to existing internal initiatives
Once you’ve crunched the numbers and confirmed a business model that works for your use case, then you can proceed. Prompting an AI agent for your use case will continue to be a case of incorporating the best practices we’ve laid out thus far:
Understanding the fundamentals of how the AI agent works under the hood
Formatting your prompts to ensure the agent works consistently in your favor
Managing prompt templates as code
Your feature will be another tool that needs to be managed more closely than your other proprietary software, as the industry underlying it is shifting more quickly than others. Every aspect of the feature will need to be ready to shift with the new standards. If you are leveraging a single agent vendor such as Anthropic’s Claude or OpenAI’s ChatGPT, the granular best practices may vary and shift with each model release – and there are usually 2+ new releases per calendar year among the major players. As with any nascent technology, expect the changes to be swift and the oversight on them to be critical to your success.
Your use case may also make sense to switch between agents based on the context – if it is a SaaS product that is transparent about optimizing each client request for best results, it could be useful to allow the end user to switch between target models. This will also help spread the request load across models rather than overloading your cost structure for calls on a single API.
This is where request optimization comes in handy – storing and serving common responses for general user requests can also prevent API overload with nonspecific data.
In short:
Deeply understand your use case before starting implementation
Build a highly specific cost structure
Consider using multiple models if it provides additional value
Be ready to move at light speed – the speed of change in the current AI sphere
Optimize, optimize, optimize
At Elevate Consulting, we are currently leveraging the latest AI models to level up our clients’ business revenue and streamline their existing workflows. Drop us a line to see what we can do for you!
Prompt Engineering for LLMs
by John Berryman, Albert Ziegler
Building LLM Powered Applications
by Valentina Alto
Building Applications with AI Agents
by Michael Albada