A recent study found that Large Language Models (LLMs) like ChatGPT can self-generate feature attribution explanations, but their effectiveness, compared to traditional methods, varies. The study finds no clear winner across different faithfulness metrics, and the explanations show high disagreement. Additionally, the explanation values from LLMs tend to be well-rounded and lack fine-grained variation, suggesting a human-like reasoning approach but raising questions about their precision and utility.
Can LLMs Really Explain Themselves? A Look at ChatGPT's Explanatory Abilities
This study explores how LLMs explain their decisions, revealing strengths and weaknesses. Learn about accuracy trade-offs, model behavior, and how to leverage self-explanations for better AI interaction.