Evaluating the Sensitivity, Specificity, and Accuracy of ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard Against Conventional Drug-Drug Interactions Clinical Tools - PubMed

Evaluating the Effectiveness of AI Chatbots in Drug-Drug Interactions

The use of artificial intelligence (AI) in healthcare has gained significant attention in recent years, with AI chatbots being explored for their potential in improving patient care. A recent study published in the Drug Healthc Patient Saf journal evaluated the sensitivity, specificity, and accuracy of popular AI chatbots, including ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard, in predicting drug-drug interactions (DDIs).

Introduction to AI Chatbots in Healthcare

AI chatbots are equipped with advanced algorithms that can analyze vast amounts of data and provide accurate information. In the context of healthcare, these chatbots can be used to detect potential DDIs, which can help improve patient safety. However, the accuracy of these chatbots in predicting DDIs is not well established.

Methodology

The study compared the performance of four AI chatbots - ChatGPT-3.5, ChatGPT-4, Bing AI, and Bard - in detecting clinically relevant DDIs for 255 drug pairs. The chatbots were evaluated using descriptive statistics, such as specificity, sensitivity, accuracy, negative predictive value (NPV), and positive predictive value (PPV).

Results

The results showed that Microsoft Bing AI had the highest accuracy and specificity, outperforming Google's Bard, ChatGPT-3.5, and ChatGPT-4. The specificity of the chatbots ranged from 0.372 (ChatGPT-3.5) to 0.769 (Microsoft Bing AI), while the accuracy ranged from 0.469 (ChatGPT-3.5) to 0.788 (Microsoft Bing AI).

Key Findings

The study found that:

  • Bing AI had the highest accuracy and specificity among the four chatbots evaluated.
  • ChatGPT-3.5 had the lowest specificity and accuracy among the four chatbots.
  • The performance of the chatbots improved when a free DDI source was used as a reference.
  • ChatGPT-3.5 and ChatGPT-4 showed high variability in accuracy across different drug classes.

Conclusion

The study highlights the potential of AI chatbots in transforming patient care, particularly in detecting potential DDIs. While the current AI platforms evaluated have limitations, their ability to quickly analyze significant interactions with good sensitivity suggests a promising step towards improved patient safety. However, further research is needed to establish the accuracy and reliability of these chatbots in clinical settings.

Future Directions

The use of AI chatbots in healthcare is an rapidly evolving field, and further studies are needed to explore their potential applications and limitations. Some potential areas of research include:

  • Evaluating the performance of AI chatbots in detecting DDIs in real-world clinical settings.
  • Comparing the performance of different AI chatbots in detecting DDIs.
  • Exploring the potential of AI chatbots in other areas of healthcare, such as disease diagnosis and treatment.

References

For more information on the study, please refer to the original article published in the Drug Healthc Patient Saf journal. Additionally, you can explore other articles on the topic, such as Performance of Artificial Intelligence Chatbots on Glaucoma Questions Adapted From Patient Brochures and Large Language Models in Hematology Case Solving: A Comparative Study of ChatGPT-3.5, Google Bard, and Microsoft Bing.

You can also visit our website for more information on artificial intelligence in healthcare and patient safety.

. . .