A recent study found that Large Language Models (LLMs) like ChatGPT can self-generate feature attribution explanations, but their effectiveness, compared to traditional methods, varies. The study finds no clear winner across different faithfulness metrics, and the explanations show high disagreement. Additionally, the explanation values from LLMs tend to be well-rounded and lack fine-grained variation, suggesting a human-like reasoning approach but raising questions about their precision and utility.

Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations
Large language models (LLMs) such as ChatGPT have demonstrated superior performance on a variety of natural language processing (NLP) tasks including sentiment