
The world of AI models is vast, complex, and rapidly evolving. New capabilities emerge daily, and what seemed ideal at the project's start might no longer be the best choice by its conclusion. Companies frequently find themselves overwhelmed, wondering whether GPT-4 offers advantages over Claude, or if specialised open-source models would better suit their needs.
There are several common consideration points and numerous critical factors to think about when selecting an AI model, including:
- Vendor alignment: Ensure your chosen provider aligns seamlessly with your current technology stack, vendor relationships, and support requirements. Vendor reliability, technical support, and roadmap clarity matter significantly.
- Data residency and compliance: With growing global regulations such as GDPR or the AI Act, it's crucial to select models that respect your data residency and compliance needs. Confirm the vendor's data handling practices and jurisdictions.
- Cost efficiency: Evaluate the financial viability. Consider not just the upfront costs but also operational expenses, scaling costs, and potential hidden fees associated with your usage patterns.
- Performance and speed: Different use cases require different performance benchmarks. Is your priority fast responses or accuracy and depth of insights? Understand clearly what matters most for your application.
- Intelligence and capabilities: Consider what specific capabilities you require, such as:
- Coding proficiency
- Structure data outputs
- Advanced reasoning or context awareness
- Tool integration and external API calls
Why It's Difficult to Make the 'Right' Choice
Even with meticulous planning, predicting the perfect model can feel impossible. By the time your project reaches completion, advancements or new competitors may have entirely reshaped the landscape, making your initial choice less optimal.
The reality is focusing all your effort on choosing the 'perfect' model upfront isn't feasible. Instead, opt for a practical and strategic approach.
First, select a 'good enough' model. Start by selecting a model that sufficiently meets your current needs and budget. Then, build flexibility into your solution. Structure your project with modularity and adaptability in mind, enabling easy integration and swapping of different models without major disruptions.
At Tangent, flexibility isn't just a concept, it's foundational to how we design and deliver AI-driven solutions. Our approach prioritises modular architectures that allow seamless model interchangeability.
For instance, our conversational AI solutions integrate multiple language models simultaneously, leveraging each one's strengths. This modularity means we can quickly adopt newer or better-performing models as they emerge, without significant effort or re-engineering costs. Clients benefit immediately from cutting-edge developments without being locked into outdated technology.
So what could this look like in practice?
- Initial phase: Your team selects GPT-4.1 due to its advanced reasoning
- Mid-project: A new open-source model appears, offering significant cost advantages
- Final phase: Thanks to a flexible architecture, your team integrates the new model rapidly, benefiting from immediate cost savings without compromising performance
Reflect on your current implementation... how easily could your project adapt to a similar scenario? If adapting seems daunting, perhaps it's time to explore a more flexible approach.
For most standard tasks, any of the models from the leading AI companies would be suitable. The main driving factors would be:
-
Availability in region (while all models are available in the US, EU/UK is often limited in this aspect)
-
Availability within a certain platform: e.g. one could wish to use AWS Bedrock or Azure for hosting the whole application including the LLM. We discourage this approach, as it limits both the model families and the specific models available
-
Rate limits (tokens per minute/hour): varying both between vendors and models, this is an often overlooked aspect
-
Cost and flexibility: we recommend picking models where there are at least 3 different layers of pricing/performance, as this allows picking the right model per task. As of September 2025 all 3 major LLM vendors provide 3 layers of pricing though
-
Ability to use Zero Data Retention and turning off/configuring default Guardrails
-
Maximum context length
Now things get much more difficult when specific tasks are considered, such as:
-
Data extraction
-
-
NER (named entity recognition)
-
Data & key-value extraction
-
-
Documents
-
Document classification & tagging
-
Multi-document summarisation & aggregation
-
Multilingual understanding & translation
-
Paraphrasing & rewriting
-
-
Tools
-
-
Working with structured data (tables, JSON, XML)
-
Structured output generation & schema validation
-
Tool use & API orchestration
-
-
Code
-
-
Code generation (functions, scripts, automation)
-
Code understanding & refactoring
-
Debugging & test generation
-
-
Knowledge retrieval & RAG integration performance
-
Conversations
-
-
Question answering (extractive & generative)
-
Compliance & policy adherence (e.g. GDPR filtering)
-
Conversational consistency & multi-turn reasoning
-
Retaining a specific style
-
-
Advanced reasoning
-
-
Math & symbolic reasoning
-
Logical problem solving & chain-of-thought reasoning
-
Planning & multi-step reasoning (agentic tasks)
-
While there are publicly available benchmarks on a lot of those areas, we strongly recommend building your own benchmark data and run the considered models against it. The model performance may vary depending on the actual tasks needed to be performed.
Building benchmarks is also a prerequisite to using open source LLMs. If you want to run an OS model, you want to pick the right one, possibly fine tune it as well and be able to reliably test the performance on the representative benchmark.
Looking for support with your AI project? Or need guidance with how to approach LLMs and delivery? Get it touch with us today.