How to Detect and Fix AI Issues with RagaAI

How to Detect and Fix AI Issues with RagaAI

How to Detect and Fix AI Issues with RagaAI

Jigar Gupta

Feb 16, 2024

As a data scientist, one of the biggest challenges is to detect issues with AI/ML applications, diagnose them and then fix them in an iterative manner. In this blog, we will provide a tour of how data scientists can use the RagaAI platform to perform all these steps in a structured and scientific manner. 


RagaAI is the most comprehensive testing platform for ML/AI applications, with 300+ tests powered by RagaAI DNA embeddings. It is designed to be multi-modal (supporting LLMs, Tabular Data, Computer Vision and other applications) from the ground up and is built for data scientist and engineering teams. You can learn more about the RagaAI platform here and try the product here


RagaAI Testing Platform

For the purpose of this blog, we will perform a full model improvement iteration (test, diagnose and fix) for an object detection model (yolov7) that has been trained on the open-source BDD (Berkey DeepDrive) dataset. This can be systematically and scientifically executed on the RagaAI platform in four steps - 


Step 1: Upload Dataset and Model Inferences

Let us start by uploading our dataset for the automotive object detection application to the RagaAI platform. For this purpose, we’ll upload test dataset images, annotations and model inferences. We do not need to upload the model itself given that we’re performing black box testing on the platform which relies only on the inputs and the outputs to the model.

Step 2: Detect Issues - Failure Mode Analysis

Next, let us access the model performance on the test dataset. For this, we’ll run RagaAI’s Failure Mode Analysis. This test is designed to help users identify edge cases where the model performs poorly. You can read more about it here

From the failure mode analysis results, it is clear that while model performance across the test dataset is acceptable, it performs very poorly on a specific scenario - night time conditions where the model precision is just 41%. 

Step 3: Diagnose Issues - Data Drift

Next, let us understand why the model performance on this cluster might be poor. There could be many reasons for this problem, from data issues to model training quality. The leading cause of poor performance in edge cases is data drift, the fact that the model was never trained on such datapoints in the first place. Let us execute a data drift test on the RagaAI platform to check if this is the reason for our object detection model’s poor performance on night time scenarios. You can read more about RagaAI’s data drift test here.

From the data drift test results, it is clear that most of the nightime datapoints in the test dataset are out of distribution (in red) with respect to the training dataset. This is the reason for poor model generalisation to night time scenarios.  

Step 4: Fix Dataset Issues - Active Learning

Now that we know the issue for poor model performance, let us fix this problem by adding some nighttime datapoints to the training dataset and fine tuning the model. However, how many datapoints to add and which datapoints to select is a very important decision that directly affects the quality of model finetuning. This is where we’ll use RagaAI’s datapoint selection methodology to craft our new dataset. You can read more about this test here.

This datapoint selection test for active learning directly helps users select relevant datapoints for model fine-tuning (green datapoints). This is done by trying to maximising coverage of the areas where we see data drift while minimising the number of points to be selected to lower costs, increase speed of iteration and maintain optimal training dataset balance.

Conclusion

In this blog, we’ve tested an AI application end-to-end using the RagaAI platform. This helped us detect issues (poor performance in night time scenarios), identify the root cause for the issue (data drift) and fix the problem (with active learning). This is a sample of how the RagaAI testing platform brings systematisation and scientific rigour to the process of AI testing. This impact is clear, it helps data science teams accelerate AI development by 3x with faster data and model iterations and reduce issues in production by at least 90%. 

As a data scientist, one of the biggest challenges is to detect issues with AI/ML applications, diagnose them and then fix them in an iterative manner. In this blog, we will provide a tour of how data scientists can use the RagaAI platform to perform all these steps in a structured and scientific manner. 


RagaAI is the most comprehensive testing platform for ML/AI applications, with 300+ tests powered by RagaAI DNA embeddings. It is designed to be multi-modal (supporting LLMs, Tabular Data, Computer Vision and other applications) from the ground up and is built for data scientist and engineering teams. You can learn more about the RagaAI platform here and try the product here


RagaAI Testing Platform

For the purpose of this blog, we will perform a full model improvement iteration (test, diagnose and fix) for an object detection model (yolov7) that has been trained on the open-source BDD (Berkey DeepDrive) dataset. This can be systematically and scientifically executed on the RagaAI platform in four steps - 


Step 1: Upload Dataset and Model Inferences

Let us start by uploading our dataset for the automotive object detection application to the RagaAI platform. For this purpose, we’ll upload test dataset images, annotations and model inferences. We do not need to upload the model itself given that we’re performing black box testing on the platform which relies only on the inputs and the outputs to the model.

Step 2: Detect Issues - Failure Mode Analysis

Next, let us access the model performance on the test dataset. For this, we’ll run RagaAI’s Failure Mode Analysis. This test is designed to help users identify edge cases where the model performs poorly. You can read more about it here

From the failure mode analysis results, it is clear that while model performance across the test dataset is acceptable, it performs very poorly on a specific scenario - night time conditions where the model precision is just 41%. 

Step 3: Diagnose Issues - Data Drift

Next, let us understand why the model performance on this cluster might be poor. There could be many reasons for this problem, from data issues to model training quality. The leading cause of poor performance in edge cases is data drift, the fact that the model was never trained on such datapoints in the first place. Let us execute a data drift test on the RagaAI platform to check if this is the reason for our object detection model’s poor performance on night time scenarios. You can read more about RagaAI’s data drift test here.

From the data drift test results, it is clear that most of the nightime datapoints in the test dataset are out of distribution (in red) with respect to the training dataset. This is the reason for poor model generalisation to night time scenarios.  

Step 4: Fix Dataset Issues - Active Learning

Now that we know the issue for poor model performance, let us fix this problem by adding some nighttime datapoints to the training dataset and fine tuning the model. However, how many datapoints to add and which datapoints to select is a very important decision that directly affects the quality of model finetuning. This is where we’ll use RagaAI’s datapoint selection methodology to craft our new dataset. You can read more about this test here.

This datapoint selection test for active learning directly helps users select relevant datapoints for model fine-tuning (green datapoints). This is done by trying to maximising coverage of the areas where we see data drift while minimising the number of points to be selected to lower costs, increase speed of iteration and maintain optimal training dataset balance.

Conclusion

In this blog, we’ve tested an AI application end-to-end using the RagaAI platform. This helped us detect issues (poor performance in night time scenarios), identify the root cause for the issue (data drift) and fix the problem (with active learning). This is a sample of how the RagaAI testing platform brings systematisation and scientific rigour to the process of AI testing. This impact is clear, it helps data science teams accelerate AI development by 3x with faster data and model iterations and reduce issues in production by at least 90%. 

As a data scientist, one of the biggest challenges is to detect issues with AI/ML applications, diagnose them and then fix them in an iterative manner. In this blog, we will provide a tour of how data scientists can use the RagaAI platform to perform all these steps in a structured and scientific manner. 


RagaAI is the most comprehensive testing platform for ML/AI applications, with 300+ tests powered by RagaAI DNA embeddings. It is designed to be multi-modal (supporting LLMs, Tabular Data, Computer Vision and other applications) from the ground up and is built for data scientist and engineering teams. You can learn more about the RagaAI platform here and try the product here


RagaAI Testing Platform

For the purpose of this blog, we will perform a full model improvement iteration (test, diagnose and fix) for an object detection model (yolov7) that has been trained on the open-source BDD (Berkey DeepDrive) dataset. This can be systematically and scientifically executed on the RagaAI platform in four steps - 


Step 1: Upload Dataset and Model Inferences

Let us start by uploading our dataset for the automotive object detection application to the RagaAI platform. For this purpose, we’ll upload test dataset images, annotations and model inferences. We do not need to upload the model itself given that we’re performing black box testing on the platform which relies only on the inputs and the outputs to the model.

Step 2: Detect Issues - Failure Mode Analysis

Next, let us access the model performance on the test dataset. For this, we’ll run RagaAI’s Failure Mode Analysis. This test is designed to help users identify edge cases where the model performs poorly. You can read more about it here

From the failure mode analysis results, it is clear that while model performance across the test dataset is acceptable, it performs very poorly on a specific scenario - night time conditions where the model precision is just 41%. 

Step 3: Diagnose Issues - Data Drift

Next, let us understand why the model performance on this cluster might be poor. There could be many reasons for this problem, from data issues to model training quality. The leading cause of poor performance in edge cases is data drift, the fact that the model was never trained on such datapoints in the first place. Let us execute a data drift test on the RagaAI platform to check if this is the reason for our object detection model’s poor performance on night time scenarios. You can read more about RagaAI’s data drift test here.

From the data drift test results, it is clear that most of the nightime datapoints in the test dataset are out of distribution (in red) with respect to the training dataset. This is the reason for poor model generalisation to night time scenarios.  

Step 4: Fix Dataset Issues - Active Learning

Now that we know the issue for poor model performance, let us fix this problem by adding some nighttime datapoints to the training dataset and fine tuning the model. However, how many datapoints to add and which datapoints to select is a very important decision that directly affects the quality of model finetuning. This is where we’ll use RagaAI’s datapoint selection methodology to craft our new dataset. You can read more about this test here.

This datapoint selection test for active learning directly helps users select relevant datapoints for model fine-tuning (green datapoints). This is done by trying to maximising coverage of the areas where we see data drift while minimising the number of points to be selected to lower costs, increase speed of iteration and maintain optimal training dataset balance.

Conclusion

In this blog, we’ve tested an AI application end-to-end using the RagaAI platform. This helped us detect issues (poor performance in night time scenarios), identify the root cause for the issue (data drift) and fix the problem (with active learning). This is a sample of how the RagaAI testing platform brings systematisation and scientific rigour to the process of AI testing. This impact is clear, it helps data science teams accelerate AI development by 3x with faster data and model iterations and reduce issues in production by at least 90%. 

As a data scientist, one of the biggest challenges is to detect issues with AI/ML applications, diagnose them and then fix them in an iterative manner. In this blog, we will provide a tour of how data scientists can use the RagaAI platform to perform all these steps in a structured and scientific manner. 


RagaAI is the most comprehensive testing platform for ML/AI applications, with 300+ tests powered by RagaAI DNA embeddings. It is designed to be multi-modal (supporting LLMs, Tabular Data, Computer Vision and other applications) from the ground up and is built for data scientist and engineering teams. You can learn more about the RagaAI platform here and try the product here


RagaAI Testing Platform

For the purpose of this blog, we will perform a full model improvement iteration (test, diagnose and fix) for an object detection model (yolov7) that has been trained on the open-source BDD (Berkey DeepDrive) dataset. This can be systematically and scientifically executed on the RagaAI platform in four steps - 


Step 1: Upload Dataset and Model Inferences

Let us start by uploading our dataset for the automotive object detection application to the RagaAI platform. For this purpose, we’ll upload test dataset images, annotations and model inferences. We do not need to upload the model itself given that we’re performing black box testing on the platform which relies only on the inputs and the outputs to the model.

Step 2: Detect Issues - Failure Mode Analysis

Next, let us access the model performance on the test dataset. For this, we’ll run RagaAI’s Failure Mode Analysis. This test is designed to help users identify edge cases where the model performs poorly. You can read more about it here

From the failure mode analysis results, it is clear that while model performance across the test dataset is acceptable, it performs very poorly on a specific scenario - night time conditions where the model precision is just 41%. 

Step 3: Diagnose Issues - Data Drift

Next, let us understand why the model performance on this cluster might be poor. There could be many reasons for this problem, from data issues to model training quality. The leading cause of poor performance in edge cases is data drift, the fact that the model was never trained on such datapoints in the first place. Let us execute a data drift test on the RagaAI platform to check if this is the reason for our object detection model’s poor performance on night time scenarios. You can read more about RagaAI’s data drift test here.

From the data drift test results, it is clear that most of the nightime datapoints in the test dataset are out of distribution (in red) with respect to the training dataset. This is the reason for poor model generalisation to night time scenarios.  

Step 4: Fix Dataset Issues - Active Learning

Now that we know the issue for poor model performance, let us fix this problem by adding some nighttime datapoints to the training dataset and fine tuning the model. However, how many datapoints to add and which datapoints to select is a very important decision that directly affects the quality of model finetuning. This is where we’ll use RagaAI’s datapoint selection methodology to craft our new dataset. You can read more about this test here.

This datapoint selection test for active learning directly helps users select relevant datapoints for model fine-tuning (green datapoints). This is done by trying to maximising coverage of the areas where we see data drift while minimising the number of points to be selected to lower costs, increase speed of iteration and maintain optimal training dataset balance.

Conclusion

In this blog, we’ve tested an AI application end-to-end using the RagaAI platform. This helped us detect issues (poor performance in night time scenarios), identify the root cause for the issue (data drift) and fix the problem (with active learning). This is a sample of how the RagaAI testing platform brings systematisation and scientific rigour to the process of AI testing. This impact is clear, it helps data science teams accelerate AI development by 3x with faster data and model iterations and reduce issues in production by at least 90%. 

As a data scientist, one of the biggest challenges is to detect issues with AI/ML applications, diagnose them and then fix them in an iterative manner. In this blog, we will provide a tour of how data scientists can use the RagaAI platform to perform all these steps in a structured and scientific manner. 


RagaAI is the most comprehensive testing platform for ML/AI applications, with 300+ tests powered by RagaAI DNA embeddings. It is designed to be multi-modal (supporting LLMs, Tabular Data, Computer Vision and other applications) from the ground up and is built for data scientist and engineering teams. You can learn more about the RagaAI platform here and try the product here


RagaAI Testing Platform

For the purpose of this blog, we will perform a full model improvement iteration (test, diagnose and fix) for an object detection model (yolov7) that has been trained on the open-source BDD (Berkey DeepDrive) dataset. This can be systematically and scientifically executed on the RagaAI platform in four steps - 


Step 1: Upload Dataset and Model Inferences

Let us start by uploading our dataset for the automotive object detection application to the RagaAI platform. For this purpose, we’ll upload test dataset images, annotations and model inferences. We do not need to upload the model itself given that we’re performing black box testing on the platform which relies only on the inputs and the outputs to the model.

Step 2: Detect Issues - Failure Mode Analysis

Next, let us access the model performance on the test dataset. For this, we’ll run RagaAI’s Failure Mode Analysis. This test is designed to help users identify edge cases where the model performs poorly. You can read more about it here

From the failure mode analysis results, it is clear that while model performance across the test dataset is acceptable, it performs very poorly on a specific scenario - night time conditions where the model precision is just 41%. 

Step 3: Diagnose Issues - Data Drift

Next, let us understand why the model performance on this cluster might be poor. There could be many reasons for this problem, from data issues to model training quality. The leading cause of poor performance in edge cases is data drift, the fact that the model was never trained on such datapoints in the first place. Let us execute a data drift test on the RagaAI platform to check if this is the reason for our object detection model’s poor performance on night time scenarios. You can read more about RagaAI’s data drift test here.

From the data drift test results, it is clear that most of the nightime datapoints in the test dataset are out of distribution (in red) with respect to the training dataset. This is the reason for poor model generalisation to night time scenarios.  

Step 4: Fix Dataset Issues - Active Learning

Now that we know the issue for poor model performance, let us fix this problem by adding some nighttime datapoints to the training dataset and fine tuning the model. However, how many datapoints to add and which datapoints to select is a very important decision that directly affects the quality of model finetuning. This is where we’ll use RagaAI’s datapoint selection methodology to craft our new dataset. You can read more about this test here.

This datapoint selection test for active learning directly helps users select relevant datapoints for model fine-tuning (green datapoints). This is done by trying to maximising coverage of the areas where we see data drift while minimising the number of points to be selected to lower costs, increase speed of iteration and maintain optimal training dataset balance.

Conclusion

In this blog, we’ve tested an AI application end-to-end using the RagaAI platform. This helped us detect issues (poor performance in night time scenarios), identify the root cause for the issue (data drift) and fix the problem (with active learning). This is a sample of how the RagaAI testing platform brings systematisation and scientific rigour to the process of AI testing. This impact is clear, it helps data science teams accelerate AI development by 3x with faster data and model iterations and reduce issues in production by at least 90%. 

Subscribe to our newsletter to never miss an update

Subscribe to our newsletter to never miss an update

Get Started With RagaAI®

Book a Demo

Schedule a call with AI Testing Experts

Get Started With RagaAI®

Book a Demo

Schedule a call with AI Testing Experts

Get Started With RagaAI®

Book a Demo

Schedule a call with AI Testing Experts

Get Started With RagaAI®

Book a Demo

Schedule a call with AI Testing Experts