Open-Source AI Guardrails Can Be Removed in Minutes, Researchers Find

Open-source artificial intelligence models often come with built-in safety features — but a new finding shows those protections can be stripped away in less than ten minutes. Researchers who tested several widely used open-source AI systems say the ease of removal exposes a gap in current regulatory frameworks that were designed mainly for proprietary, closed-source models.

The vulnerability in open-source models

Guardrails in AI systems are meant to prevent harmful outputs — blocking requests for illegal activity, hate speech, or dangerous instructions. In proprietary models like those from major tech companies, these guardrails are tightly integrated and hard to bypass. But in open-source models, the code is publicly available. That allows anyone with technical skill to locate the guardrail code and modify or delete it.

Researchers found that in some cases, simply deleting a few lines of code or adjusting a configuration file removed the restrictions entirely. The entire process, they said, can take under ten minutes. The speed of the attack means even relatively unsophisticated users could disable safety measures intended to prevent misuse.

Regulatory frameworks lag behind

The finding lands as governments around the world scramble to write rules for AI. The European Union’s AI Act, for example, focuses heavily on risk categories and transparency requirements for developers. But many of those rules were crafted with proprietary, centrally controlled AI systems in mind. Open-source models, by design, are decentralized and often distributed without ongoing oversight from the original creator.

That creates a blind spot. If a company releases an open-source model with guardrails, but anyone can then release a version without them, the original developer may not be held accountable for downstream misuse. Current regulations rarely address this scenario directly.

For the tech industry, the ease of removing guardrails raises questions about how to responsibly release open-source AI. Some developers have argued that open models promote transparency and innovation. But critics say that without enforceable safety standards, these models could be used to generate disinformation, malware, or hateful content at scale.

There is no consensus yet on what the solution should look like. Some have proposed requiring all open-source models to include cryptographic signatures that verify guardrails are intact. Others suggest that liability rules should apply to the creators of models that are modified and redistributed without safeguards. Neither approach has gained widespread regulatory support.

For now, the responsibility falls largely on the organizations that choose to deploy open-source models. But the researchers’ findings suggest that without broader regulatory action, the guardrails that exist today offer little more than a false sense of security.

Whether regulators will move to close that gap — and how they would enforce rules on a decentralized ecosystem — remains an open question.

The vulnerability in open-source models

Regulatory frameworks lag behind

Related Articles