Datali logodatali

Main Menu

What is red teaming and why you need it?

Red teaming: safe & secure GenAI 

Published: 3/4/2025, 1:00:00 AM

SUMMARY

🟡 How to ensure that your GenAI is safe and secure?
🟡 What is red teaming and why you may need it?
🟡 How to execute it properly?

Take the first step toward securing your generative AI.
Answer these critical questions today with Datali!

We know it from many movies. The good guy has to join the dark side to assess crucial information needed for winning the case for their team. Our hero bravery and cleverly infiltrates the enemy, learning both its thinking and tactics, and as a result, secures the glorious triumph of the good.
The lesson learnt from that may be as follows: sometimes we must get to know the adversarial side to ensure safety and security. Just as it happens in the stories, we can witness it in real-life scenarios, even in the GenAI and LLMs’ world. Here we introduce you to the red teaming for safe and secure GenAI.

The purpose of this text is to describe red teaming in LLM context. Starting with explaining the concept of red teaming, we'll outline its role within systematic measurement and mitigation. There will be a place to learn more about tactics used by the adversarial team. Finally, we will overview phases of red teaming, describing steps needed to lead a successful red teaming within the organization.

Red teaming - the hero that acts as a villain

Traditionally the concept of the red teaming originates from the Cold War. It was primarily used by the US to describe one of its military training tactics. Dividing its troops into two groups, each was assigned the colour, red or blue. The role of the red team was quite simple to grasp. They were to attack, mimicking the tactics and behaviours of the adversarial side. It's like being attacked by a friend who thinks and acts like the enemy. All to test you for a real-life attack.
As a result, the red team aims to:

  • identify weak points of the system
  • verify detection and prevention capabilities 
  • try out response capabilities

Red teaming within systematic measurements framework

Red teaming can be part of the other systematic measurement. It is not the replacement for the comprehensive testing. It is the practice that points out what needs to be further mitigated. As a feedback mechanism, it uncovers yet unknown gaps. In no way it will replace the development and implementation of valuable findings based on the deliveries. The strategic takeaways usually may address one of the typical adversarial targets. Choosing the right TTPs (tactics, techniques and procedures) during the red teaming may uncover it all.

Adversarial target

Example

Support users in illegal activities

Generating instructions for performing fraudulent accounting

Biases in the LLM itself

Discriminatory behaviours like different  performance reviews for native and non-native speakers

Generating violent and/or inappropriate content

Content including examples of racist, violence, sexism

Other: leaking personally identifying info or intellectual property infringement

The real adversarial attack may happen on various LLM levels, LLM base model level or application level. That's why if you train the model, you can be interested in exploring its vulnerabilities. For those using the model, it helps ensure its safe integration and functioning within the wider system. Red teaming will identify risks specific to the company's context and LLMs role. Depending on your capabilities and preferences, you can create the internal red team or partner with an external one.

Inside the red teaming - adversarial tactics, techniques, and procedures

Let’s assume you have just gathered your red team. They are ready to operate. They are about to start the attack and testing. What can you expect to happen exactly? The most anticipated scenario is testing via one of the most popular real-world tactics. Among them you can find:

Prompt Engineering. It's about designing prompts that will influence the model to deliver output not intended by the original model application. The model may not be able to differentiate between the original model instructions and the user input.

Exfiltration. The model can include intellectual property, sensible in nature. The red team will be to access the asset and copy it, for example, to develop its own model.

Data poisoning. Even a small percentage of control over the dataset may manipulate the model. The poisoned dataset can influence the model output to one's preferences.

Backdoor the model. It leads to delivering the wrong output after receiving the input with a specific word or a feature. To achieve such a result, the red team can focus on adjusting the model's weights or hiding the code in the model.

Adversarial examples. The designed, manipulated inputs are to mislead the model into producing incorrect or undesirable outputs. The inputs are crafted by applying small changes that take advantage of the model’s vulnerabilities.

Training data extraction. Here it's possible to test whether the sensitive or personal information was successfully removed from the training dataset. The red team attempts to extract it.

Of course, this is not a complete list. The new tactics are continuously researched and reported. Some are designed for a specific model or a specific model use case. The red team with a high level of expertise should be able to mimic them as well.

You know you need a hero, but how to prepare?

Resembling any operation, the red teaming also requires a few steps to fully grasp its potential and maximize the benefits. Like in almost every special mission, we’ve got several things to do before, during and after the action. 

Usually, the very first step is to find the right team that will perform the mission. The careful team composition will ensure the vital know-how of such tests. They will also have appropriate expertise in the area of LLM and GenAI. In reality, such attacks would be performed by experts in the field, able to bypass the security and design and implement new TTPs. We are to mimic it too. The red team can be derived from internal resources, as well as external ones. However, collaborating with external partners may enhance the entire experience. All thanks to introduced novelty and usually wider expertise. 

The next steps revolve around strategy, tactic and documentation. The developed strategy will answer the question of what to specifically test. For instance, LLM base model, user interaction, both of that or even something else. The tactic will cover how to test, whether we are interested in the open-end test via creatively exploring the vulnerabilities or by taking a more systematic approach with the list of pre-defined possible harms to investigate. The tactic can also mix these two approaches. However some level of prioritization for the specific harm in the iteration is highly welcomed. Finally comes the decision of how to deliver and document the results. You may be interested in recording the input used and the received output, the notes on how to reproduce the test, or in other findings.

Then it's time for the intended red teaming and monitoring its progress. Concluded, all left is to properly manage the findings. You can report it to the stakeholders alongside other relevant info. Remember to draw a line between mitigation and identification activities. Keep that in mind - red teaming is the latter. Although the precious part, it won't replace other stages. Only with them, red teaming will boost safety and security.

Want to know even more?

Join our AI newsletter!
You will unlock premium articles
and get the latest news!