opensource.google.com

Menu

The New Frontier of Security: Creating Safe and Secure AI Models

Wednesday, January 29, 2025


Are you looking to safely create the next state-of-the-art AI model? Today we’re sharing a list of recommendations on how to create and distribute your models securely.


Choosing the Right Foundation: Safe Model Formats

Before you start building your model, consider using a safe file format, as it can influence your development tool options. However, if you've already created a model, you can also convert it to a safe format before sharing it.

Once trained, models are saved and distributed as binary files. Common formats include PyTorch (Pickles usually with .pt, .pth extensions), TensorFlow SavedModel (.pb), GGUF (.gguf), and Safetensors (.safetensors). However, binary files are dangerous because it's hard to verify if their content is safe. This is especially true with formats such as Pickles and SavedModels, which are designed to include arbitrary code, raising the risk of remote code execution (RCE) on users' machines.

To mitigate these risks:

  • If sharing only model weights: Consider formats such as Safetensors. These formats only contain model weights, and are therefore safe from RCE.
  • If sharing weights and metadata: Consider formats like GGUF, which include weights and additional metadata but not executable code configurations.
  • For any format, but especially if your model requires custom code: Keep reading to see how to help users verify that they're getting the correct model.

Secure Releases and Verification Methods

To ensure your users are getting the model you originally deployed, consider automating your releases, making them transparent and auditable. Instead of training your model on your local machine, consider using a predefined script to train your model within an isolated environment. When building smaller models, using GitHub Actions can be a good option. However, for larger models, GitHub Actions might not have the necessary hardware capabilities or availability. In that case, and if budget allows, consider using other platforms with proper security safeguards, such as Google Cloud Platform (GCP).

If building your model on a cloud platform is not an option, you can sign the model on your local machine to give your users confidence it was created by you.

However, if building your model on a cloud platform, sign the model and generate a provenance attestation for the release. This allows users to not only confirm that the model was created by you, but that it came from your approved infrastructure, was trained following the specific instructions defined in your training script, and wasn't tampered with by a malicious actor.

While signatures and provenance do not guarantee the absence of malicious intent from the developer, they provide users with a means to verify the integrity of the model they downloaded.

On GitHub, signing and provenance can be easily achieved using GitHub Artifact Attestations. For the general use case, tools like Sigstore and SLSA are also available to sign and attest provenance to your deployments.

For a few examples, check out this workflow to build a model with SLSA on GitHub and on GCP, and an example of how to sign models.


Educate Your Users

After sharing your model with the world, it is essential to educate users on the safety and security concerns surrounding model consumption. You should therefore:

  • Document potential biases in your models and datasets.
  • Clearly display all licenses associated with the model and datasets.
  • Benchmark your model, assessing and disclosing metrics around hallucinations, prompt injection risk and fairness.

With this information users can make informed decisions and abide by the ethical and technical guidelines associated with your model. For instance, they might choose to implement an input sanitization layer to enhance their software security.


Establish a Security Policy

Your model’s privacy, safety, and security guarantees can be documented in separate files or in a security policy. A security policy helps you address users' safety and security concerns regarding your models. It is a dedicated file that instructs users on how to privately report vulnerabilities, such as prompt injection strategies or potential out-of-memory (OOM) errors, allowing you to investigate and address the potential vulnerabilities before they become public knowledge. It is also a good place to define the scope of what your project considers a vulnerability.

In summary, considering model security from the outset of development is crucial. Additionally, ensuring safe distribution and informing users of potential risks is essential. It's important to remember that security is a continuous process – more like a marathon than a sprint – and constant vigilance is necessary to mitigate potential threats.


Keep Improving

The steps above will put your models’ security on a solid footing, but there is always more you can learn and do. Please take a look at Google's Secure AI Framework for a deeper dive in this subject, and take its risk self assessment to better understand which risks are most important for you.

By Gabriela Gutierrez and Pedro Nacht – GOSST Upstream Team

.