A chain of critical remote code execution vulnerabilities has been discovered across several major AI inference frameworks used by Meta, Nvidia, Microsoft, and leading open-source projects. The flaws, uncovered by researchers at Oligo Security, reveal how insecure code patterns were copied across repositories, exposing the global AI infrastructure to widespread and systemic security risks.
How the Vulnerability Spread Across AI Frameworks
Researchers found that the initial security flaw originated in Meta’s Llama Stack, where developers used ZeroMQ’s recv-pyobj() function to receive data and immediately deserialize it using Python’s pickle.loads(). This approach created a dangerous pathway for arbitrary code execution over unauthenticated network sockets.
Oligo Security’s researcher Avi Lumelsky explained that the real issue was the way this insecure code traveled unmodified into other frameworks. During the investigation, the team found nearly identical code fragments across Nvidia TensorRT-LLM, vLLM, SGLang, and Modular Max Server. In some cases, files even contained comments stating they were adapted from vLLM.
The firm has named this phenomenon the ShadowMQ pattern, referring to a hidden communication-layer flaw that spreads through copy-and-paste reuse rather than clean implementation. Because AI inference frameworks form the foundation of high-value enterprise systems, the flaw’s replication represents a serious security concern.
Understanding the Technical Risk
Python’s pickle is known for its ability to execute arbitrary code during deserialization. While acceptable in closed environments, it becomes highly dangerous when exposed through a network protocol like ZeroMQ. Oligo said it discovered thousands of publicly reachable ZeroMQ sockets tied to AI inference servers, magnifying the potential impact.
If exploited, the flaws could allow attackers to:
• Execute arbitrary code on GPU clusters
• Escalate privileges within an AI environment
• Steal proprietary model weights or customer data
• Install GPU-based cryptominers
• Compromise cloud AI infrastructure used by major organizations
SGLang, one of the affected frameworks, is already used by companies such as xAI, AMD, Nvidia, Intel, LinkedIn, Oracle Cloud, Cursor, and Google Cloud. This widespread adoption compounds the overall risk.
Frameworks Now Patched After Industry-Wide Disclosure
Oligo coordinated disclosure with all affected parties and reported the initial vulnerability (CVE-2024-50050) to Meta in September 2024. Meta quickly replaced pickle-based deserialization with safer JSON logic and updated Llama Stack beginning with version 0.0.41.
Additional CVEs were assigned for other frameworks, including:
• CVE-2025-30165 for vLLM
• CVE-2025-23254 for Nvidia TensorRT-LLM
• CVE-2025-60455 for Modular Max Server
All vendors have since issued patches, and updated versions are available to enterprise users. The discovery highlights a growing pattern of vulnerabilities within AI infrastructure, where unsafe design choices are inadvertently replicated across an increasingly interconnected ecosystem.
Why This Matters for Enterprise AI Infrastructure
AI inference servers are now central to enterprise operations, handling sensitive prompts, responses, model computations, and customer data. As the use of generative AI expands, AI workloads are increasingly deployed across cloud clusters and GPU farms, making security failures far more consequential.
The ShadowMQ flaw demonstrates that AI vendors may be unknowingly introducing systemic vulnerabilities through code reuse. With many enterprises deploying multiple frameworks within the same environment, a single malicious entry point could compromise an entire AI pipeline.
Mitigation and Recommendations
Oligo advises that organizations upgrade immediately to patched versions, including:
• Meta Llama Stack 0.0.41 or later
• Nvidia TensorRT-LLM 0.18.2 or later
• vLLM 0.8.0 or later
• Modular Max Server v25.6 or later
Additional mitigation steps include:
• Avoiding pickle deserialization for untrusted data
• Adding authentication layers such as HMAC or TLS to ZeroMQ communication
• Tightening network exposure and isolating inference servers
• Training development teams on secure serialization practices
As AI adoption accelerates, the cybersecurity community warns that similar issues may continue to arise unless best practices are reinforced across both commercial and open-source development. The ShadowMQ incident underscores the need for stronger security culture in the AI tooling ecosystem, especially as AI workloads become critical components of global enterprise infrastructure.


