Declined

A Comparative Evaluation of Large Language Models for Enterprise Deployment: Performance, Safety, Cost, and Scalability Using a Multi-Criteria Decision Framework

Authors

1

Jurgen Mecaj

Mediterranean University of Albania

Abstract

The proliferation of large language models (LLMs) across enterprise, research, and public-sector applications has created an urgent need for rigorous, multi-dimensional evaluation frameworks capable of guiding model selection beyond single-metric leaderboards. This paper presents a comprehensive comparative analysis of seven state-of-the-art LLMs — GPT-4o (OpenAI), Claude 3.5 Sonnet (Anthropic), Gemini 1.5 Pro and Flash (Google DeepMind), LLaMA 3 70B (Meta AI), Mistral Large (Mistral AI), and Claude 3 Haiku (Anthropic) — across eight evaluation dimensions: benchmark accuracy, safety alignment, cost efficiency, inference latency, context handling, deployment flexibility, multilingual capability, and scalability. The study applies a weighted Multi-Criteria Decision Analysis (MCDA) framework to produce transparent composite rankings from empirical benchmark data. Evaluation leverages five standardized benchmarks (MMLU, HumanEval, HellaSwag, GSM8K, MATH), official API pricing data, throughput metrics, and curated safety evaluation datasets. Results indicate that Claude 3.5 Sonnet achieves the highest MCDA composite score (0.801), driven by combined strengths in accuracy (90.4% MMLU, 92.0% HumanEval) and safety alignment (4.9/5). Gemini 1.5 Flash emerges as the optimal choice for cost-sensitive, high-throughput deployments ($0.075/1M tokens; 210 tok/s). The paper additionally analyzes architectural trade-offs between dense transformers and Mixture-of-Experts (MoE) designs, scaling law evidence, safety evaluation profiles, and provides a nine-row deployment recommendation matrix. This work contributes an extensible, evidence-based decision framework with practical guidance for practitioners, researchers, and enterprise decision-makers navigating the rapidly evolving LLM ecosystem.

Publication Info

Submitted
02 April 2026

Original Article

View this article on the original journal website for additional features and citation options.

View in OJS

Share

Publication History

Transparent editorial process timeline

Submitted

02 Apr 2026

Editorial Decision

09 Apr 2026