【论文复现】ReDeEP

方法图示：

参考项目：Jeryi-Sun/ReDEeP-ICLR，详情参见：Xuan-Van/ReDeEP。

1 安装

1.1 虚拟环境

conda create -n redeep python=3.9
conda activate redeep
pip install numpy==1.26.0 torch==2.0.1 accelerate==0.23.0 pandas==2.1.1 scikit-learn===1.3.1 sentence_transformers ipykernel
python -m ipykernel install --user --name redeep
jupyter kernelspec list

cd src
pip install -e transformers

1.2 项目结构

dataset/
    copy_heads # 复制头信息
    dolly # 数据集
    ragtruth # 数据集
    token_hyperparameter # AARF.py 的超参数

log/ # 保存运行结果

src/ # 保存项目脚本
    AARF.py
    detect.py
    regress.py

transformers/ # 保存修改的 transformers 库

test.sh # 更多研究所使用的脚本

1.3 模型

1
2

huggingface-cli download --resume-download meta-llama/Llama-2-7b-chat-hf --token Your_token --local-dir model/Llama-2-7b-chat-hf
huggingface-cli download --resume-download BAAI/bge-base-en-v1.5 --local-dir model/bge-base-en-v1.5

1.4 数据集

数据集下载：google drive

以 RAGTruth 数据集为例，其结构为：

response.jsonl：

{
    "id": str, # 回应的索引 
    "source_id": str, 来源信息的索引,
    "model": str, # 生成回应的模型：gpt-4-0613、gpt-3.5-turbo-0613、mistral-7B-instruct、llama-2-7b-chat、llama-2-13b-chat、llama-2-70b-chat
    "temperature": float, # 生成回应的温度：0.7、0.775、1.0、0.85、0.925 
    "labels": [
        {
            "start": int, # 在回应中的起始位置
            "end": int, # 在回应中的终止位置
            "text": str, # 回应中的幻觉文本
            "meta": str, # 注释人员对幻觉的评论
            "label_type": str, # 幻觉类型
            "implicit_true": bool, # 是否和上下文冲突：回应正确，上下文没提及
            "due_to_null": bool, # 幻觉是否由 null 值引起
        },
        ...
    ], 
    "split": str, # train、test 
    "quality": str, # good（回应质量好）、incorrect_refusal（尽管存在相关上下文，模型错误地拒绝回答）、truncated（回应意外截断） 
    "response": str, 大模型对给定指令的回应
}
```  

2. `source_info.jsonl`：
```json
{
    "source_id": str, # 来源信息的索引
    "task_type": str, # Summary、QA、Data2txt 
    "source": str, # 原始内容来源：CNN/DM、Recent News、Yelp、MARCO 
    "source_info": str, # RAG 设置的基本内容：Summary是字符串，其他任务是字典
    "prompt": str, # 用来生成回应的提示
}

1.5 更多研究

在相同的评估指标下，将 Copy Heads 替换为每层的 Attention Heads，得到的结果：

2 脚本分析

2.1 文件结构

dataset/
    llama2_7B_response_chunk.json：chunk 级别的 regress.py 的数据集构建
    llama2_7B_response_token.json：token 级别的 regress.py 的数据集构建
    response.jsonl：只有一组数据，用于 token 级别的 detect.py 和 AARF.py
    response_spans.jsonl：只有一组数据，用于 chunk 级别的 detect.py
    source_info_chunk.jsonl：用于 chunk 级别的 regress.py
    source_info.jsonl：只有一组数据，用于 token 级别的 detect.py 和 AARF.py
    source_info_spans.jsonl：只有一组数据，用于 chunk 级别的 detect.py
    token_hyperparameter.json：AARF.py 的超参数
    topk_heads.json：用于 detect.py 和 token 级别的 regress.py
    
output/
    AARF_add_1.2_reduce_0.8_threshold_0.6.json：由 AARF.py 生成
    llama2_7B_response_chunk.json：由 chunk 级别的 detect.py 生成
    llama2_7B_response_token.json：由 token 级别的 detect.py 生成
    ReDeEP_chunk.json：由 chunk 级别的 regress.py 生成
    ReDeEP_token.json：由 token 级别的 regress.py 生成
    
src/ # 脚本分析
    token_detect.ipynb
    chunk_detect.ipynb
    token_regress.ipynb
    chunk_regress.ipynb
    AARF.ipynb
    
transformers/ # 修改后的 transformers 包

2.2 detect.py

2.2.1 token 级别

导入必要的包：

import sys
sys.path.insert(0, '../transformers/src')  # 将一个特定的路径添加到 Python 的模块搜索路径中

import torch
import json
import argparse
from transformers import AutoModelForCausalLM, AutoTokenizer
from torch.nn import functional as F
from tqdm import tqdm

加载 response：

response = []
with open("../dataset/response.jsonl", 'r') as f:
    for line in f:
        data = json.loads(line)
        response.append(data)
        
print(json.dumps(response, ensure_ascii=False, indent=4))

[
    {
        "id": "27",
        "source_id": "15596",
        "model": "llama-2-7b-chat",
        "temperature": 0.7,
        "labels": [],
        "split": "test",
        "quality": "good",
        "response": "FBI charges Philadelphia woman with attempting to join ISIS after purchasing electronic visa for Turkey. Keonna Thomas, 30, also known as \"Young Lioness\" and \"Fatayat Al Khilafah,\" made numerous social media posts expressing desire to fight for ISIS. She could face 15 years in prison. Three women have been arrested this week on terror charges, including two in New York who were accused of planning to build an explosive device for attacks in the US."
    }
]

加载 source_info：

source_info_dict = {}
with open("../dataset/source_info.jsonl", 'r') as f:
    for line in f:
        data = json.loads(line)
        source_info_dict[data['source_id']] = data
        
print(json.dumps(source_info_dict, ensure_ascii=False, indent=4))

{
    "15596": {
        "source_id": "15596",
        "task_type": "Summary",
        "source": "CNN/DM",
        "source_info": "The FBI charged a Philadelphia woman on Thursday with trying to travel overseas to fight for ISIS. She's one of three women arrested this week on terror charges. Two New York women were also taken into custody. An FBI complaint cites numerous social media messages dating back to August 2013 that were sent by Keonna Thomas, 30, also known as \"Young Lioness\" and \"Fatayat Al Khilafah.\" One Twitter message said, \"If we truly knew the realities ... we all would be rushing to join our brothers in the front lines pray ALLAH accept us as shuhada [martyrs].\" Another said, \"When you're a mujahid [violent jihadi fighter] your death becomes a wedding.\" The FBI said Thomas purchased an electronic visa to Turkey on March 23. Turkey is known as the easiest place from which to enter Syria and join ISIS. An ISIS manual advises recruits to buy round-trip tickets to vacation spots such as Spain and then purchase tickets for their real destination once they arrive overseas, the FBI said. On March 26, Thomas purchased a ticket to Barcelona, with a March 29 departure and an April 15 return to the United States, the complaint said. It's not clear when or where she was arrested. She was charged with knowingly attempting to provide material support and resources to a designated foreign terrorist organization. She could be sentenced to 15 years in prison. On Thursday, Noelle Velentzas, 28, and her former roommate, Asia Siddiqui, 31, were arrested in New York and accused of planning to build an explosive device for attacks in the United States, federal prosecutors said. In the past 18 months, the Justice Department's National Security Division has prosecuted or is prosecuting more than 30 cases of people attempting to travel abroad to join or provide support to terrorist groups. Of those cases, 18 allegedly involve support to ISIS. \"The terrorist threat is more decentralized, more diffuse, more complicated,\" Homeland Security Secretary Jeh Johnson told reporters Thursday. \"It involves the potential lone wolf actor, it involves the effective use of social media, the Internet.\"\n",
        "prompt": "Summarize the following news within 86 words:\nThe FBI charged a Philadelphia woman on Thursday with trying to travel overseas to fight for ISIS. She's one of three women arrested this week on terror charges. Two New York women were also taken into custody. An FBI complaint cites numerous social media messages dating back to August 2013 that were sent by Keonna Thomas, 30, also known as \"Young Lioness\" and \"Fatayat Al Khilafah.\" One Twitter message said, \"If we truly knew the realities ... we all would be rushing to join our brothers in the front lines pray ALLAH accept us as shuhada [martyrs].\" Another said, \"When you're a mujahid [violent jihadi fighter] your death becomes a wedding.\" The FBI said Thomas purchased an electronic visa to Turkey on March 23. Turkey is known as the easiest place from which to enter Syria and join ISIS. An ISIS manual advises recruits to buy round-trip tickets to vacation spots such as Spain and then purchase tickets for their real destination once they arrive overseas, the FBI said. On March 26, Thomas purchased a ticket to Barcelona, with a March 29 departure and an April 15 return to the United States, the complaint said. It's not clear when or where she was arrested. She was charged with knowingly attempting to provide material support and resources to a designated foreign terrorist organization. She could be sentenced to 15 years in prison. On Thursday, Noelle Velentzas, 28, and her former roommate, Asia Siddiqui, 31, were arrested in New York and accused of planning to build an explosive device for attacks in the United States, federal prosecutors said. In the past 18 months, the Justice Department's National Security Division has prosecuted or is prosecuting more than 30 cases of people attempting to travel abroad to join or provide support to terrorist groups. Of those cases, 18 allegedly involve support to ISIS. \"The terrorist threat is more decentralized, more diffuse, more complicated,\" Homeland Security Secretary Jeh Johnson told reporters Thursday. \"It involves the potential lone wolf actor, it involves the effective use of social media, the Internet.\"\n\noutput:"
    }
}

加载 model 和 tokenizer：

model_name = "../../model/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(model_name)
device = "cuda"

Loading checkpoint shards: 100%|██████████| 2/2 [00:35<00:00, 17.92s/it]

加载 copy_heads：

with open("../dataset/topk_heads.json", 'r') as f:
    copy_heads = json.load(f)
    
print(copy_heads)
print(len(copy_heads))

[[25, 0], [18, 13], [18, 10], [27, 9], [5, 29], [23, 8], [31, 28], [3, 0], [31, 24], [13, 20], [31, 18], [1, 14], [2, 5], [22, 10], [2, 22], [15, 7], [3, 19], [20, 17], [10, 20], [23, 30], [20, 22], [1, 27], [20, 1], [31, 19], [28, 18], [20, 15], [1, 21], [19, 1], [20, 5], [16, 1], [18, 9], [5, 13]]
32

选择数据类型，对应 JSONL 的 model 字段：

data_type = "llama-2-7b-chat"

select_response = []
i = 0
response[i]['model'] == data_type and response[i]["split"] == "test"

True

字段提取：

response_rag = response[i]['response']
source_id = response[i]['source_id']
temperature = response[i]['temperature']
prompt = source_info_dict[source_id]['prompt']

print(response_rag)
print(source_id)
print(temperature)
print(prompt)

FBI charges Philadelphia woman with attempting to join ISIS after purchasing electronic visa for Turkey. Keonna Thomas, 30, also known as "Young Lioness" and "Fatayat Al Khilafah," made numerous social media posts expressing desire to fight for ISIS. She could face 15 years in prison. Three women have been arrested this week on terror charges, including two in New York who were accused of planning to build an explosive device for attacks in the US.
15596
0.7
Summarize the following news within 86 words:
The FBI charged a Philadelphia woman on Thursday with trying to travel overseas to fight for ISIS. She's one of three women arrested this week on terror charges. Two New York women were also taken into custody. An FBI complaint cites numerous social media messages dating back to August 2013 that were sent by Keonna Thomas, 30, also known as "Young Lioness" and "Fatayat Al Khilafah." One Twitter message said, "If we truly knew the realities ... we all would be rushing to join our brothers in the front lines pray ALLAH accept us as shuhada [martyrs]." Another said, "When you're a mujahid [violent jihadi fighter] your death becomes a wedding." The FBI said Thomas purchased an electronic visa to Turkey on March 23. Turkey is known as the easiest place from which to enter Syria and join ISIS. An ISIS manual advises recruits to buy round-trip tickets to vacation spots such as Spain and then purchase tickets for their real destination once they arrive overseas, the FBI said. On March 26, Thomas purchased a ticket to Barcelona, with a March 29 departure and an April 15 return to the United States, the complaint said. It's not clear when or where she was arrested. She was charged with knowingly attempting to provide material support and resources to a designated foreign terrorist organization. She could be sentenced to 15 years in prison. On Thursday, Noelle Velentzas, 28, and her former roommate, Asia Siddiqui, 31, were arrested in New York and accused of planning to build an explosive device for attacks in the United States, federal prosecutors said. In the past 18 months, the Justice Department's National Security Division has prosecuted or is prosecuting more than 30 cases of people attempting to travel abroad to join or provide support to terrorist groups. Of those cases, 18 allegedly involve support to ISIS. "The terrorist threat is more decentralized, more diffuse, more complicated," Homeland Security Secretary Jeh Johnson told reporters Thursday. "It involves the potential lone wolf actor, it involves the effective use of social media, the Internet."

output:

构造模型输入：

messages = [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt[:12000]} # 截取前 12000 个字符
        ]
messages

[{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user',
'content': 'Summarize the following news within 86 words:\nThe FBI charged a Philadelphia woman on Thursday with trying to travel overseas to fight for ISIS. She\'s one of three women arrested this week on terror charges. Two New York women were also taken into custody. An FBI complaint cites numerous social media messages dating back to August 2013 that were sent by Keonna Thomas, 30, also known as "Young Lioness" and "Fatayat Al Khilafah." One Twitter message said, "If we truly knew the realities ... we all would be rushing to join our brothers in the front lines pray ALLAH accept us as shuhada [martyrs]." Another said, "When you\'re a mujahid [violent jihadi fighter] your death becomes a wedding." The FBI said Thomas purchased an electronic visa to Turkey on March 23. Turkey is known as the easiest place from which to enter Syria and join ISIS. An ISIS manual advises recruits to buy round-trip tickets to vacation spots such as Spain and then purchase tickets for their real destination once they arrive overseas, the FBI said. On March 26, Thomas purchased a ticket to Barcelona, with a March 29 departure and an April 15 return to the United States, the complaint said. It\'s not clear when or where she was arrested. She was charged with knowingly attempting to provide material support and resources to a designated foreign terrorist organization. She could be sentenced to 15 years in prison. On Thursday, Noelle Velentzas, 28, and her former roommate, Asia Siddiqui, 31, were arrested in New York and accused of planning to build an explosive device for attacks in the United States, federal prosecutors said. In the past 18 months, the Justice Department\'s National Security Division has prosecuted or is prosecuting more than 30 cases of people attempting to travel abroad to join or provide support to terrorist groups. Of those cases, 18 allegedly involve support to ISIS. "The terrorist threat is more decentralized, more diffuse, more complicated," Homeland Security Secretary Jeh Johnson told reporters Thursday. "It involves the potential lone wolf actor, it involves the effective use of social media, the Internet."\n\noutput:'}]

将 messages 转换为结构化文本字符串：

1 2	`text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) # 不进行分词，同时添加一个生成提示的标记 print(text)`

<s>[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>

Summarize the following news within 86 words:
The FBI charged a Philadelphia woman on Thursday with trying to travel overseas to fight for ISIS. She's one of three women arrested this week on terror charges. Two New York women were also taken into custody. An FBI complaint cites numerous social media messages dating back to August 2013 that were sent by Keonna Thomas, 30, also known as "Young Lioness" and "Fatayat Al Khilafah." One Twitter message said, "If we truly knew the realities ... we all would be rushing to join our brothers in the front lines pray ALLAH accept us as shuhada [martyrs]." Another said, "When you're a mujahid [violent jihadi fighter] your death becomes a wedding." The FBI said Thomas purchased an electronic visa to Turkey on March 23. Turkey is known as the easiest place from which to enter Syria and join ISIS. An ISIS manual advises recruits to buy round-trip tickets to vacation spots such as Spain and then purchase tickets for their real destination once they arrive overseas, the FBI said. On March 26, Thomas purchased a ticket to Barcelona, with a March 29 departure and an April 15 return to the United States, the complaint said. It's not clear when or where she was arrested. She was charged with knowingly attempting to provide material support and resources to a designated foreign terrorist organization. She could be sentenced to 15 years in prison. On Thursday, Noelle Velentzas, 28, and her former roommate, Asia Siddiqui, 31, were arrested in New York and accused of planning to build an explosive device for attacks in the United States, federal prosecutors said. In the past 18 months, the Justice Department's National Security Division has prosecuted or is prosecuting more than 30 cases of people attempting to travel abroad to join or provide support to terrorist groups. Of those cases, 18 allegedly involve support to ISIS. "The terrorist threat is more decentralized, more diffuse, more complicated," Homeland Security Secretary Jeh Johnson told reporters Thursday. "It involves the potential lone wolf actor, it involves the effective use of social media, the Internet."

output: [/INST]

构建模型完整的输入输出：

1 2	`input_text = text + response_rag print(input_text)`

<s>[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>

Summarize the following news within 86 words:
The FBI charged a Philadelphia woman on Thursday with trying to travel overseas to fight for ISIS. She's one of three women arrested this week on terror charges. Two New York women were also taken into custody. An FBI complaint cites numerous social media messages dating back to August 2013 that were sent by Keonna Thomas, 30, also known as "Young Lioness" and "Fatayat Al Khilafah." One Twitter message said, "If we truly knew the realities ... we all would be rushing to join our brothers in the front lines pray ALLAH accept us as shuhada [martyrs]." Another said, "When you're a mujahid [violent jihadi fighter] your death becomes a wedding." The FBI said Thomas purchased an electronic visa to Turkey on March 23. Turkey is known as the easiest place from which to enter Syria and join ISIS. An ISIS manual advises recruits to buy round-trip tickets to vacation spots such as Spain and then purchase tickets for their real destination once they arrive overseas, the FBI said. On March 26, Thomas purchased a ticket to Barcelona, with a March 29 departure and an April 15 return to the United States, the complaint said. It's not clear when or where she was arrested. She was charged with knowingly attempting to provide material support and resources to a designated foreign terrorist organization. She could be sentenced to 15 years in prison. On Thursday, Noelle Velentzas, 28, and her former roommate, Asia Siddiqui, 31, were arrested in New York and accused of planning to build an explosive device for attacks in the United States, federal prosecutors said. In the past 18 months, the Justice Department's National Security Division has prosecuted or is prosecuting more than 30 cases of people attempting to travel abroad to join or provide support to terrorist groups. Of those cases, 18 allegedly involve support to ISIS. "The terrorist threat is more decentralized, more diffuse, more complicated," Homeland Security Secretary Jeh Johnson told reporters Thursday. "It involves the potential lone wolf actor, it involves the effective use of social media, the Internet."

output: [/INST]FBI charges Philadelphia woman with attempting to join ISIS after purchasing electronic visa for Turkey. Keonna Thomas, 30, also known as "Young Lioness" and "Fatayat Al Khilafah," made numerous social media posts expressing desire to fight for ISIS. She could face 15 years in prison. Three women have been arrested this week on terror charges, including two in New York who were accused of planning to build an explosive device for attacks in the US.

将文本字符串转换为 token ID 序列，text 为模型输入的文本（系统提示+问题），response_rag 为模型的回应：

input_ids = tokenizer([input_text], return_tensors="pt").input_ids # input_ids = prefix_ids + continue_ids
prefix_ids = tokenizer([text], return_tensors="pt").input_ids
continue_ids = input_ids[0, prefix_ids.shape[-1]:]

print(input_ids.shape)
print(prefix_ids.shape)
print(continue_ids.shape)

torch.Size([1, 670])
torch.Size([1, 564])
torch.Size([106])

定位幻觉文本片段，其实就是重新模拟了一下模型的推理过程，因此需要对幻觉文本片段进行重新定位：

def calculate_hallucination_spans(response, text, response_rag, tokenizer, prefix_len):
    hallucination_span = []

    # 遍历每个幻觉文本片段
    for item in response:
        # 幻觉文本片段的起始和结束位置
        start_id = item['start']
        end_id = item['end']

        start_text = text + response_rag[:start_id] # 幻觉文本片段之前的文本
        end_text = text + response_rag[:end_id] # 幻觉文本片段之前的文本+幻觉文本片段

        # 文本字符串转换为 token ID 序列
        start_text_id = tokenizer(start_text, return_tensors="pt").input_ids
        end_text_id = tokenizer(end_text, return_tensors="pt").input_ids

        # token ID 序列长度
        start_id = start_text_id.shape[-1]
        end_id = end_text_id.shape[-1]

        # 通过长度，就可以返回幻觉文本片段的起止位置
        hallucination_span.append([start_id, end_id])

    return hallucination_span

# 定位幻觉片段：hallucination_spans 保存 response 中所有的幻觉文本片段在 input_ids 的起止位置
if "labels" in response[i].keys(): # prefix_ids.shape[-1] 是模型输入的长度
    hallucination_spans = calculate_hallucination_spans(response[i]['labels'], text, response_rag, tokenizer, prefix_ids.shape[-1])
else:
    hallucination_spans = []

执行模型推理：

start_p, end_p = None, None
start, number = 0, 32

with torch.no_grad():
    logits_dict, outputs = model(
        input_ids=input_ids.to(device),
        return_dict=True,
        output_attentions=True, # 返回每一层的注意力得分
        output_hidden_states=True, # 返回每一层的隐藏状态
        knowledge_layers=list(range(start, number)) # 返回指定层的 MLP 的输出状态
    )
    
print(outputs.keys()) # past_key_values 是用于加速自回归生成的缓存键值对

odict_keys(['logits', 'past_key_values', 'hidden_states', 'attentions'])

对于 MLP 层：value[0] 是 MLP 层的输出，value[1] 是 MLP 层的残差连接：

logits_dict = {key: [value[0].to(device), value[1].to(device)] for key, value in logits_dict.items()} # 张量移到 GPU 上计算

print(logits_dict.keys())
print(logits_dict[0][0].shape, logits_dict[0][1].shape)

dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31])
torch.Size([1, 670, 32000]) torch.Size([1, 670, 32000])

outputs 是模型的输出，包含了 logits、hidden_states 和 attentions：

hidden_states = outputs["hidden_states"]  # 所有层的隐藏状态
print(type(hidden_states))
print(len(hidden_states)) # embedding 层 + 32 个 layer 层
print(hidden_states[0].shape) # 每层形状

last_hidden_states = hidden_states[-1][0, :, :]  # 最后一层的隐藏状态，用于计算 ECS
print(last_hidden_states.shape)

<class 'tuple'>
33
torch.Size([1, 670, 4096])
torch.Size([670, 4096])

定义幻觉检测变量：

1
2
3

external_similarity = []  # ECS
parameter_knowledge_difference = [] # PKS
hallucination_label = [] # 幻觉标签

保存 copy heads 的注意力得分：

attentions_list = []
for attentions_layer_id in range(len(outputs.attentions)): # 每一层 layer
    for head_id in range(outputs.attentions[attentions_layer_id].shape[1]): # 每一层的每一个 head
        if [attentions_layer_id, head_id] not in copy_heads: # 只选择 copy heads 中的 head
            continue
        attentions_list.append({"layer_head": (attentions_layer_id, head_id), # 记录 layer 和 head 的 ID
                                "attention_score": outputs.attentions[attentions_layer_id][:, head_id, :, :]}) # 记录对应的注意力得分

print(outputs.attentions[attentions_layer_id].shape)
print(outputs.attentions[attentions_layer_id][:, head_id, :, :].shape)
print(len(attentions_list))

torch.Size([1, 32, 670, 670])
torch.Size([1, 670, 670])
32

JS 散度计算和判断幻觉 token 的函数：

# 计算 JS散度：两个概率分布之间的相似度
def calculate_dist(sep_vocabulary_dist, sep_attention_dist):
    # 将输入分布转换为概率分布
    softmax_mature_layer = F.softmax(sep_vocabulary_dist, dim=-1)
    softmax_anchor_layer = F.softmax(sep_attention_dist, dim=-1)

    # 计算两个概率分布的平均分布
    M = 0.5 * (softmax_mature_layer + softmax_anchor_layer)

    # 计算两个概率分布的对数形式
    log_softmax_mature_layer = F.log_softmax(sep_vocabulary_dist, dim=-1)
    log_softmax_anchor_layer = F.log_softmax(sep_attention_dist, dim=-1)

    # 计算两个分布对于平均分布的 KL 散度
    kl1 = F.kl_div(log_softmax_mature_layer, M, reduction='none').mean(-1)
    kl2 = F.kl_div(log_softmax_anchor_layer, M, reduction='none').mean(-1)

    # 计算 JS 散度
    js_divs = 0.5 * (kl1 + kl2)

    return js_divs.cpu().item() * 10e5  # 乘以 10e5 是为了放大数值，便于观察

# 判断给定的 token 是否属于预定义的幻觉文本片段
def is_hallucination_token(token_id, hallucination_spans):
    for span in hallucination_spans:
        if token_id >= span[0] and token_id <= span[1]:
            return True
    return False

计算 ECS 和 PKS，标记幻觉标签：

# 遍历 response 的每一个 token id
for seq_i in range(prefix_ids.shape[-1] - 1, input_ids.shape[-1] - 1):
    pointer_scores_list = [attention_dict["attention_score"][:, seq_i, :] for attention_dict in attentions_list] # 每个 copy_head 中该 token id 对应的那一行的注意力得分

    if start_p != None and end_p != None: 
        pointer_probs_list = torch.cat([pointer_scores[:, start_p:end_p] for pointer_scores in pointer_scores_list], dim=0)
    else: # 截取模型输入的那部分注意力得分
        pointer_probs_list = torch.cat([pointer_scores[:, :prefix_ids.shape[-1]] for pointer_scores in pointer_scores_list], dim=0)

    top_k = int(pointer_probs_list.shape[-1] * 0.1)  # 得到 top_k 的长度，即要关注多少个得分最高的 token ID
    sorted_indices = torch.argsort(pointer_probs_list, dim=1, descending=True) # 获取排序后的索引，按照概率从大到小排序
    top_k_indices = sorted_indices[:, :top_k] # 选择前 top_k 个索引
    flattened_indices = top_k_indices.flatten() # 将 top_k_indices 展平
    
    selected_hidden_states = last_hidden_states[flattened_indices] # 在 last_hidden_states 中查找相应的 hidden_state
    top_k_hidden_states = selected_hidden_states.view(top_k_indices.shape[0], top_k_indices.shape[1], -1) # 重新改变形状
    attend_token_hidden_state = torch.mean(top_k_hidden_states, dim=1) # 计算隐藏状态均值

    current_hidden_state = last_hidden_states[seq_i, :] # 获取当前 token ID 的最后一层隐藏状态
    current_hidden_state = current_hidden_state.unsqueeze(0).expand(attend_token_hidden_state.shape) # 扩展为与 attend_token_hidden_state 一致的维度，即一直复制 current_hidden_state

    cosine_similarity = F.cosine_similarity(attend_token_hidden_state.to(device), current_hidden_state.to(device), dim=1) # 计算余弦相似度

    if is_hallucination_token(seq_i, hallucination_spans): # 确认当前 token ID 是否属于幻觉文本片段
        hallucination_label.append(1)
    else:
        hallucination_label.append(0)

    external_similarity.append(cosine_similarity.cpu().tolist())
    parameter_knowledge_difference.append([calculate_dist(value[0][0, seq_i, :], value[1][0, seq_i, :]) for value in logits_dict.values()])
    torch.cuda.empty_cache()
    
print(len(pointer_scores_list), pointer_scores_list[0].shape)
print(len(pointer_probs_list), pointer_probs_list[0].shape)
print(top_k)
print(sorted_indices.shape)
print(top_k_indices.shape)
print(flattened_indices.shape)
print(selected_hidden_states.shape)
print(top_k_hidden_states.shape)
print(attend_token_hidden_state.shape)
print(current_hidden_state.shape)
print(current_hidden_state[0]==current_hidden_state[-1])

32 torch.Size([1, 670])
32 torch.Size([564])
56
torch.Size([32, 564])
torch.Size([32, 56])
torch.Size([1792])
torch.Size([1792, 4096])
torch.Size([32, 56, 4096])
torch.Size([32, 4096])
torch.Size([32, 4096])
tensor([True, True, True,  ..., True, True, True], device='cuda:0')

查看结果：

response[i]["external_similarity"] = external_similarity
response[i]["parameter_knowledge_difference"] = parameter_knowledge_difference
response[i]["hallucination_label"] = hallucination_label

select_response.append(response[i])
print(select_response[i].keys())
print(len(select_response[i]["external_similarity"]), len(select_response[i]["external_similarity"][0]))
print(len(select_response[i]["parameter_knowledge_difference"]), len(select_response[i]["parameter_knowledge_difference"][0]))
print(len(select_response[i]["hallucination_label"]))

dict_keys(['id', 'source_id', 'model', 'temperature', 'labels', 'split', 'quality', 'response', 'external_similarity', 'parameter_knowledge_difference', 'hallucination_label'])
106 32
106 32
106

保存结果：

1 2	`with open("../output/llama2_7B_response_token.json", "w") as f: json.dump(select_response, f, ensure_ascii=False)`

2.2.2 chunk 级别

导入必要的包：

import sys
sys.path.insert(0, '../transformers/src')
import torch
import json
import numpy as np
import argparse
from transformers import AutoModelForCausalLM, AutoTokenizer 
from torch.nn import functional as F
from tqdm import tqdm
from sentence_transformers import SentenceTransformer

加载 embedding 模型：

1	`bge_model = SentenceTransformer('../../model/bge-base-en-v1.5/').to("cuda:0")`

加载 response：

response = []
with open("../dataset/response_spans.jsonl", "r") as f:
    for line in f:
        data = json.loads(line)
        response.append(data)
    
print(json.dumps(response, ensure_ascii=False, indent=4))

[
    {
        "id": "45",
        "source_id": "15599",
        "model": "llama-2-7b-chat",
        "temperature": 0.7,
        "labels": [],
        "split": "test",
        "quality": "good",
        "response": "Blue Bell ice cream has temporarily shut down one of its manufacturing plants after discovering listeria contamination in a serving of ice cream produced at the plant. The Centers for Disease Control and Prevention (CDC) has warned consumers not to eat any Blue Bell-branded products made at the Broken Arrow, Oklahoma plant, including 3-ounce servings of ice cream marked with certain codes. This is the third time Blue Bell has taken action due to a listeria outbreak at a Kansas hospital that served the company's ice cream. Investigations into the possible connection between the ice cream and the infections are ongoing. The company has recalled other products and advises individuals and institutions to check their freezers for the recalled items and throw them away. This is the first product recall in Blue Bell's 108-year history.",
        "response_spans": [
            [
                0,
                506
            ],
            [
                491,
                840
            ]
        ]
    }
]

加载 source_info：

source_info_dict = {}
with open("../dataset/source_info_spans.jsonl", 'r') as f:
    for line in f:
        data = json.loads(line)
        source_info_dict[data['source_id']] = data
        
print(json.dumps(source_info_dict, ensure_ascii=False, indent=4))

{
    "15599": {
        "source_id": "15599",
        "task_type": "Summary",
        "source": "CNN/DM",
        "source_info": "Blue Bell ice cream has temporarily shut down one of its manufacturing plants over the discovery of listeria contamination in a serving of ice cream originating from that plant. Public health officials warned consumers Friday not to eat any Blue Bell-branded products made at the company's Broken Arrow, Oklahoma, plant. That includes 3-ounce servings of Blue Bell ice cream from this plant that went to institutions in containers marked with the letters O, P, Q, R, S or T behind the coding date. The warning by the Centers for Disease Control and Prevention does not affect other Blue Bell ice cream, including other 3-ounce servings, not made at the plant. But Blue Bell has recalled other products. The company is shutting down the Broken Arrow facility \"out of an abundance of caution\" to search for a possible cause of contamination. It is the third time Blue Bell has taken action in light of a listeria outbreak at a Kansas hospital that served the company's ice cream. Listeria monocytogenes was recently found in a cup of ice cream recovered from the hospital. The cup contaminated with the bacteria was produced at the Broken Arrow plant in April 2014, Blue Bell said. And, according to the CDC, listeria bacteria was found in additional samples of the same product that were recovered from the plant. The bacteria in the hospital sample and the factory sample appeared to match each other genetically, the CDC said. But they did not appear identical to listeria samples taken from patients infected in the Kansas outbreak. In a separate outbreak in Texas, the CDC did find that listeria samples taken from patients who came down with listeriosis between 2010 and 2014 in a hospital that served 3-ounce Blue Bell cups matched the listeria in recovered samples. None of this means the ice cream is the source of either spate of the infections. \"Investigation to determine whether these illnesses are related to exposure to Blue Bell products is ongoing,\" the CDC said. In early March, in light of the Kansas listeria outbreak, Blue Bell recalled a group of products made at a plant in Texas. It later added 3-ounce cup servings to the recall. Five people were infected and three died in the past year in Kansas from listeria that might be linked to Blue Bell Creameries products, according to the CDC. All five of them were hospitalized at the same hospital before developing listeriosis, the CDC said. At least four of them had consumed milkshakes made with Blue Bell ice cream before developing the infection. \"We are devastated and know that Blue Bell has to be and can be better than this,\" Paul Kruse, Blue Bell CEO and president, said in a statement. \"Quality and safety have always been our top priorities. We are deeply saddened and concerned for all those who have been affected.\" The CDC advises that individuals and institutions should check their freezers for the recalled products and throw them away. In a statement on its website, Blue Bell said \"this recall in no way includes Blue Bell ice cream half gallons, pints, quarts, 3 gallons or other 3 oz. cups.\" This has been the first product recall in the 108-year history of Blue Bell Creameries, the company said. Listeriosis is a serious infection caused by eating food contaminated with listeria, and primarily affects the elderly, pregnant women, newborns and people with weakened immune systems, according to the CDC. Symptoms of a listeria infection are fever and muscle aches, sometimes associated with diarrhea or other gastrointestinal symptoms. In the United States, an estimated 1,600 people become seriously ill each year, and approximately 16% of these illnesses result in death. Cervical infections caused by listeriosis in pregnant women may result in stillbirth or spontaneous abortion during the second or third trimesters. CNN's Debra Goldschmidt, Amanda Watts and Jacque Wilson contributed to this report.\n",
        "prompt": "Summarize the following news within 161 words:\nBlue Bell ice cream has temporarily shut down one of its manufacturing plants over the discovery of listeria contamination in a serving of ice cream originating from that plant. Public health officials warned consumers Friday not to eat any Blue Bell-branded products made at the company's Broken Arrow, Oklahoma, plant. That includes 3-ounce servings of Blue Bell ice cream from this plant that went to institutions in containers marked with the letters O, P, Q, R, S or T behind the coding date. The warning by the Centers for Disease Control and Prevention does not affect other Blue Bell ice cream, including other 3-ounce servings, not made at the plant. But Blue Bell has recalled other products. The company is shutting down the Broken Arrow facility \"out of an abundance of caution\" to search for a possible cause of contamination. It is the third time Blue Bell has taken action in light of a listeria outbreak at a Kansas hospital that served the company's ice cream. Listeria monocytogenes was recently found in a cup of ice cream recovered from the hospital. The cup contaminated with the bacteria was produced at the Broken Arrow plant in April 2014, Blue Bell said. And, according to the CDC, listeria bacteria was found in additional samples of the same product that were recovered from the plant. The bacteria in the hospital sample and the factory sample appeared to match each other genetically, the CDC said. But they did not appear identical to listeria samples taken from patients infected in the Kansas outbreak. In a separate outbreak in Texas, the CDC did find that listeria samples taken from patients who came down with listeriosis between 2010 and 2014 in a hospital that served 3-ounce Blue Bell cups matched the listeria in recovered samples. None of this means the ice cream is the source of either spate of the infections. \"Investigation to determine whether these illnesses are related to exposure to Blue Bell products is ongoing,\" the CDC said. In early March, in light of the Kansas listeria outbreak, Blue Bell recalled a group of products made at a plant in Texas. It later added 3-ounce cup servings to the recall. Five people were infected and three died in the past year in Kansas from listeria that might be linked to Blue Bell Creameries products, according to the CDC. All five of them were hospitalized at the same hospital before developing listeriosis, the CDC said. At least four of them had consumed milkshakes made with Blue Bell ice cream before developing the infection. \"We are devastated and know that Blue Bell has to be and can be better than this,\" Paul Kruse, Blue Bell CEO and president, said in a statement. \"Quality and safety have always been our top priorities. We are deeply saddened and concerned for all those who have been affected.\" The CDC advises that individuals and institutions should check their freezers for the recalled products and throw them away. In a statement on its website, Blue Bell said \"this recall in no way includes Blue Bell ice cream half gallons, pints, quarts, 3 gallons or other 3 oz. cups.\" This has been the first product recall in the 108-year history of Blue Bell Creameries, the company said. Listeriosis is a serious infection caused by eating food contaminated with listeria, and primarily affects the elderly, pregnant women, newborns and people with weakened immune systems, according to the CDC. Symptoms of a listeria infection are fever and muscle aches, sometimes associated with diarrhea or other gastrointestinal symptoms. In the United States, an estimated 1,600 people become seriously ill each year, and approximately 16% of these illnesses result in death. Cervical infections caused by listeriosis in pregnant women may result in stillbirth or spontaneous abortion during the second or third trimesters. CNN's Debra Goldschmidt, Amanda Watts and Jacque Wilson contributed to this report.\n\noutput:",
        "prompt_spans": [
            [
                0,
                46
            ],
            [
                47,
                556
            ],
            [
                539,
                1047
            ],
            [
                1034,
                1539
            ],
            [
                1521,
                2028
            ],
            [
                2012,
                2520
            ],
            [
                2506,
                3017
            ],
            [
                3003,
                3505
            ],
            [
                3488,
                3946
            ],
            [
                3948,
                3955
            ]
        ]
    }
}

加载模型和分词器：

model_name = "../../model/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(model_name)
device = "cuda"

Loading checkpoint shards: 100%|██████████| 2/2 [00:21<00:00, 10.71s/it]

加载 copy_heads：

with open("../dataset/topk_heads.json", 'r') as f:
    copy_heads = json.load(f)[:32]
    
print(copy_heads)
print(len(copy_heads))

[[25, 0], [18, 13], [18, 10], [27, 9], [5, 29], [23, 8], [31, 28], [3, 0], [31, 24], [13, 20], [31, 18], [1, 14], [2, 5], [22, 10], [2, 22], [15, 7], [3, 19], [20, 17], [10, 20], [23, 30], [20, 22], [1, 27], [20, 1], [31, 19], [28, 18], [20, 15], [1, 21], [19, 1], [20, 5], [16, 1], [18, 9], [5, 13]]
32

选择数据类型，对应 JSONL 的 model 字段：

data_type = "llama-2-7b-chat"

select_response = []
i = 0
response[i]['model'] == data_type and response[i]["split"] == "test"

True

字段提取：

response_rag = response[i]['response']
source_id = response[i]['source_id']
temperature = response[i]['temperature']
prompt =  source_info_dict[source_id]['prompt']
original_prompt_spans = source_info_dict[source_id]['prompt_spans'] # prompt 切分
original_response_spans = response[i]['response_spans'] # response 切分

print(response_rag)
print(source_id)
print(temperature)
print(prompt)
print(original_prompt_spans)
print(original_response_spans)

Blue Bell ice cream has temporarily shut down one of its manufacturing plants after discovering listeria contamination in a serving of ice cream produced at the plant. The Centers for Disease Control and Prevention (CDC) has warned consumers not to eat any Blue Bell-branded products made at the Broken Arrow, Oklahoma plant, including 3-ounce servings of ice cream marked with certain codes. This is the third time Blue Bell has taken action due to a listeria outbreak at a Kansas hospital that served the company's ice cream. Investigations into the possible connection between the ice cream and the infections are ongoing. The company has recalled other products and advises individuals and institutions to check their freezers for the recalled items and throw them away. This is the first product recall in Blue Bell's 108-year history.
15599
0.7
Summarize the following news within 161 words:
Blue Bell ice cream has temporarily shut down one of its manufacturing plants over the discovery of listeria contamination in a serving of ice cream originating from that plant. Public health officials warned consumers Friday not to eat any Blue Bell-branded products made at the company's Broken Arrow, Oklahoma, plant. That includes 3-ounce servings of Blue Bell ice cream from this plant that went to institutions in containers marked with the letters O, P, Q, R, S or T behind the coding date. The warning by the Centers for Disease Control and Prevention does not affect other Blue Bell ice cream, including other 3-ounce servings, not made at the plant. But Blue Bell has recalled other products. The company is shutting down the Broken Arrow facility "out of an abundance of caution" to search for a possible cause of contamination. It is the third time Blue Bell has taken action in light of a listeria outbreak at a Kansas hospital that served the company's ice cream. Listeria monocytogenes was recently found in a cup of ice cream recovered from the hospital. The cup contaminated with the bacteria was produced at the Broken Arrow plant in April 2014, Blue Bell said. And, according to the CDC, listeria bacteria was found in additional samples of the same product that were recovered from the plant. The bacteria in the hospital sample and the factory sample appeared to match each other genetically, the CDC said. But they did not appear identical to listeria samples taken from patients infected in the Kansas outbreak. In a separate outbreak in Texas, the CDC did find that listeria samples taken from patients who came down with listeriosis between 2010 and 2014 in a hospital that served 3-ounce Blue Bell cups matched the listeria in recovered samples. None of this means the ice cream is the source of either spate of the infections. "Investigation to determine whether these illnesses are related to exposure to Blue Bell products is ongoing," the CDC said. In early March, in light of the Kansas listeria outbreak, Blue Bell recalled a group of products made at a plant in Texas. It later added 3-ounce cup servings to the recall. Five people were infected and three died in the past year in Kansas from listeria that might be linked to Blue Bell Creameries products, according to the CDC. All five of them were hospitalized at the same hospital before developing listeriosis, the CDC said. At least four of them had consumed milkshakes made with Blue Bell ice cream before developing the infection. "We are devastated and know that Blue Bell has to be and can be better than this," Paul Kruse, Blue Bell CEO and president, said in a statement. "Quality and safety have always been our top priorities. We are deeply saddened and concerned for all those who have been affected." The CDC advises that individuals and institutions should check their freezers for the recalled products and throw them away. In a statement on its website, Blue Bell said "this recall in no way includes Blue Bell ice cream half gallons, pints, quarts, 3 gallons or other 3 oz. cups." This has been the first product recall in the 108-year history of Blue Bell Creameries, the company said. Listeriosis is a serious infection caused by eating food contaminated with listeria, and primarily affects the elderly, pregnant women, newborns and people with weakened immune systems, according to the CDC. Symptoms of a listeria infection are fever and muscle aches, sometimes associated with diarrhea or other gastrointestinal symptoms. In the United States, an estimated 1,600 people become seriously ill each year, and approximately 16% of these illnesses result in death. Cervical infections caused by listeriosis in pregnant women may result in stillbirth or spontaneous abortion during the second or third trimesters. CNN's Debra Goldschmidt, Amanda Watts and Jacque Wilson contributed to this report.

output:
[[0, 46], [47, 556], [539, 1047], [1034, 1539], [1521, 2028], [2012, 2520], [2506, 3017], [3003, 3505], [3488, 3946], [3948, 3955]]
[[0, 506], [491, 840]]

构造模型输入：

def add_special_template(prompt):
    messages = [
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": prompt}
            ]
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    return text

text = add_special_template(prompt[:12000])
print(text)

<s>[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>

Summarize the following news within 161 words:
Blue Bell ice cream has temporarily shut down one of its manufacturing plants over the discovery of listeria contamination in a serving of ice cream originating from that plant. Public health officials warned consumers Friday not to eat any Blue Bell-branded products made at the company's Broken Arrow, Oklahoma, plant. That includes 3-ounce servings of Blue Bell ice cream from this plant that went to institutions in containers marked with the letters O, P, Q, R, S or T behind the coding date. The warning by the Centers for Disease Control and Prevention does not affect other Blue Bell ice cream, including other 3-ounce servings, not made at the plant. But Blue Bell has recalled other products. The company is shutting down the Broken Arrow facility "out of an abundance of caution" to search for a possible cause of contamination. It is the third time Blue Bell has taken action in light of a listeria outbreak at a Kansas hospital that served the company's ice cream. Listeria monocytogenes was recently found in a cup of ice cream recovered from the hospital. The cup contaminated with the bacteria was produced at the Broken Arrow plant in April 2014, Blue Bell said. And, according to the CDC, listeria bacteria was found in additional samples of the same product that were recovered from the plant. The bacteria in the hospital sample and the factory sample appeared to match each other genetically, the CDC said. But they did not appear identical to listeria samples taken from patients infected in the Kansas outbreak. In a separate outbreak in Texas, the CDC did find that listeria samples taken from patients who came down with listeriosis between 2010 and 2014 in a hospital that served 3-ounce Blue Bell cups matched the listeria in recovered samples. None of this means the ice cream is the source of either spate of the infections. "Investigation to determine whether these illnesses are related to exposure to Blue Bell products is ongoing," the CDC said. In early March, in light of the Kansas listeria outbreak, Blue Bell recalled a group of products made at a plant in Texas. It later added 3-ounce cup servings to the recall. Five people were infected and three died in the past year in Kansas from listeria that might be linked to Blue Bell Creameries products, according to the CDC. All five of them were hospitalized at the same hospital before developing listeriosis, the CDC said. At least four of them had consumed milkshakes made with Blue Bell ice cream before developing the infection. "We are devastated and know that Blue Bell has to be and can be better than this," Paul Kruse, Blue Bell CEO and president, said in a statement. "Quality and safety have always been our top priorities. We are deeply saddened and concerned for all those who have been affected." The CDC advises that individuals and institutions should check their freezers for the recalled products and throw them away. In a statement on its website, Blue Bell said "this recall in no way includes Blue Bell ice cream half gallons, pints, quarts, 3 gallons or other 3 oz. cups." This has been the first product recall in the 108-year history of Blue Bell Creameries, the company said. Listeriosis is a serious infection caused by eating food contaminated with listeria, and primarily affects the elderly, pregnant women, newborns and people with weakened immune systems, according to the CDC. Symptoms of a listeria infection are fever and muscle aches, sometimes associated with diarrhea or other gastrointestinal symptoms. In the United States, an estimated 1,600 people become seriously ill each year, and approximately 16% of these illnesses result in death. Cervical infections caused by listeriosis in pregnant women may result in stillbirth or spontaneous abortion during the second or third trimesters. CNN's Debra Goldschmidt, Amanda Watts and Jacque Wilson contributed to this report.

output: [/INST]

构建模型完整的输入输出：

1 2	`input_text = text + response_rag print(input_text)`

<s>[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>

Summarize the following news within 161 words:
Blue Bell ice cream has temporarily shut down one of its manufacturing plants over the discovery of listeria contamination in a serving of ice cream originating from that plant. Public health officials warned consumers Friday not to eat any Blue Bell-branded products made at the company's Broken Arrow, Oklahoma, plant. That includes 3-ounce servings of Blue Bell ice cream from this plant that went to institutions in containers marked with the letters O, P, Q, R, S or T behind the coding date. The warning by the Centers for Disease Control and Prevention does not affect other Blue Bell ice cream, including other 3-ounce servings, not made at the plant. But Blue Bell has recalled other products. The company is shutting down the Broken Arrow facility "out of an abundance of caution" to search for a possible cause of contamination. It is the third time Blue Bell has taken action in light of a listeria outbreak at a Kansas hospital that served the company's ice cream. Listeria monocytogenes was recently found in a cup of ice cream recovered from the hospital. The cup contaminated with the bacteria was produced at the Broken Arrow plant in April 2014, Blue Bell said. And, according to the CDC, listeria bacteria was found in additional samples of the same product that were recovered from the plant. The bacteria in the hospital sample and the factory sample appeared to match each other genetically, the CDC said. But they did not appear identical to listeria samples taken from patients infected in the Kansas outbreak. In a separate outbreak in Texas, the CDC did find that listeria samples taken from patients who came down with listeriosis between 2010 and 2014 in a hospital that served 3-ounce Blue Bell cups matched the listeria in recovered samples. None of this means the ice cream is the source of either spate of the infections. "Investigation to determine whether these illnesses are related to exposure to Blue Bell products is ongoing," the CDC said. In early March, in light of the Kansas listeria outbreak, Blue Bell recalled a group of products made at a plant in Texas. It later added 3-ounce cup servings to the recall. Five people were infected and three died in the past year in Kansas from listeria that might be linked to Blue Bell Creameries products, according to the CDC. All five of them were hospitalized at the same hospital before developing listeriosis, the CDC said. At least four of them had consumed milkshakes made with Blue Bell ice cream before developing the infection. "We are devastated and know that Blue Bell has to be and can be better than this," Paul Kruse, Blue Bell CEO and president, said in a statement. "Quality and safety have always been our top priorities. We are deeply saddened and concerned for all those who have been affected." The CDC advises that individuals and institutions should check their freezers for the recalled products and throw them away. In a statement on its website, Blue Bell said "this recall in no way includes Blue Bell ice cream half gallons, pints, quarts, 3 gallons or other 3 oz. cups." This has been the first product recall in the 108-year history of Blue Bell Creameries, the company said. Listeriosis is a serious infection caused by eating food contaminated with listeria, and primarily affects the elderly, pregnant women, newborns and people with weakened immune systems, according to the CDC. Symptoms of a listeria infection are fever and muscle aches, sometimes associated with diarrhea or other gastrointestinal symptoms. In the United States, an estimated 1,600 people become seriously ill each year, and approximately 16% of these illnesses result in death. Cervical infections caused by listeriosis in pregnant women may result in stillbirth or spontaneous abortion during the second or third trimesters. CNN's Debra Goldschmidt, Amanda Watts and Jacque Wilson contributed to this report.

output: [/INST]Blue Bell ice cream has temporarily shut down one of its manufacturing plants after discovering listeria contamination in a serving of ice cream produced at the plant. The Centers for Disease Control and Prevention (CDC) has warned consumers not to eat any Blue Bell-branded products made at the Broken Arrow, Oklahoma plant, including 3-ounce servings of ice cream marked with certain codes. This is the third time Blue Bell has taken action due to a listeria outbreak at a Kansas hospital that served the company's ice cream. Investigations into the possible connection between the ice cream and the infections are ongoing. The company has recalled other products and advises individuals and institutions to check their freezers for the recalled items and throw them away. This is the first product recall in Blue Bell's 108-year history.

获取 token ID 序列：

input_ids = tokenizer([input_text], return_tensors="pt").input_ids
prefix_ids = tokenizer([text], return_tensors="pt").input_ids
continue_ids = input_ids[0, prefix_ids.shape[-1]:]

print(input_ids.shape)
print(prefix_ids.shape)
print(continue_ids.shape)

torch.Size([1, 1189])
torch.Size([1, 995])
torch.Size([194])

定位幻觉片段：

def calculate_hallucination_spans(response, text, response_rag, tokenizer, prefix_len):
    hallucination_span = []
    
    for item in response:
        start_id = item['start']
        end_id = item['end']
        start_text = text+response_rag[:start_id]
        end_text = text+response_rag[:end_id]
        start_text_id = tokenizer(start_text, return_tensors="pt").input_ids
        end_text_id = tokenizer(end_text, return_tensors="pt").input_ids
        start_id = start_text_id.shape[-1]
        end_id = end_text_id.shape[-1]
        hallucination_span.append([start_id, end_id])
    
    return hallucination_span

if "labels" in response[i].keys():
    hallucination_spans = calculate_hallucination_spans(response[i]['labels'], text, response_rag, tokenizer, prefix_ids.shape[-1])
else:
    hallucination_spans = []

定位 prompt 对应的 token ID 片段：

def calculate_prompt_spans(raw_prompt_spans, prompt, tokenizer):
    prompt_spans = []
    
    for item in raw_prompt_spans:
        start_id = item[0]
        end_id = item[1]
        start_text = prompt[:start_id]
        end_text = prompt[:end_id]
        added_start_text = add_special_template(start_text)
        added_end_text = add_special_template(end_text)
        
        # 减 4 是为了去除特殊 token
        start_text_id = tokenizer(added_start_text, return_tensors="pt").input_ids.shape[-1] - 4
        end_text_id = tokenizer(added_end_text,return_tensors="pt").input_ids.shape[-1] -4
        prompt_spans.append([start_text_id, end_text_id])
    return prompt_spans

prompt_spans = calculate_prompt_spans(source_info_dict[source_id]['prompt_spans'], prompt, tokenizer)
print(prompt_spans)

[[22, 37], [37, 155], [151, 275], [270, 390], [387, 513], [507, 629], [626, 740], [737, 875], [869, 987], [987, 991]]

定位 response 对应的 token ID 片段：

def calculate_respond_spans(raw_response_spans, text, response_rag, tokenizer):
    respond_spans = []
    for item in raw_response_spans:
        start_id = item[0]
        end_id = item[1]
        start_text = text+response_rag[:start_id]
        end_text = text+response_rag[:end_id]
        start_text_id = tokenizer(start_text, return_tensors="pt").input_ids
        end_text_id = tokenizer(end_text, return_tensors="pt").input_ids
        start_id = start_text_id.shape[-1]
        end_id = end_text_id.shape[-1]
        respond_spans.append([start_id, end_id])
    return respond_spans

respond_spans = calculate_respond_spans(response[i]['response_spans'], text, response_rag, tokenizer)
print(respond_spans)

[[995, 1114], [1112, 1189]]

执行模型推理：

start_p, end_p = None, None
start, number = 0, 32

with torch.no_grad():
    logits_dict, outputs = model(
            input_ids=input_ids, 
            return_dict=True,
            output_attentions=True,
            output_hidden_states=True,
            knowledge_layers=list(range(start, number))
        )
    
print(outputs.keys())

odict_keys(['logits', 'past_key_values', 'hidden_states', 'attentions'])

获取 MLP 层的内容：

logits_dict = {key: [value[0].to(device), value[1].to(device)] for key, value in logits_dict.items()}

print(logits_dict.keys())
print(logits_dict[0][0].shape, logits_dict[0][1].shape)

dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31])
torch.Size([1, 1189, 32000]) torch.Size([1, 1189, 32000])

获取隐藏状态：

hidden_states = outputs["hidden_states"]
print(type(hidden_states))
print(len(hidden_states)) # embedding 层 + 32 个 layer 层
print(hidden_states[0].shape) # 每层形状

last_hidden_states = hidden_states[-1][0, :, :]
print(last_hidden_states.shape)

<class 'tuple'>
33
torch.Size([1, 1189, 4096])
torch.Size([1189, 4096])

定义幻觉检测变量：

1
2
3

external_similarity = []
parameter_knowledge_difference = []
hallucination_label = []

计算 ECS：

def calculate_sentence_similarity(r_text, p_text):
    part_embedding = bge_model.encode([r_text], normalize_embeddings=True)
    q_embeddings = bge_model.encode([p_text], normalize_embeddings=True)

    # 计算得分：用点积计算，因为向量已经归一化
    scores_named = np.matmul(q_embeddings, part_embedding.T).flatten()
    return float(scores_named[0])

span_socre_dict = []
for r_id, r_span in enumerate(respond_spans):
    layer_head_span = {}
    for attentions_layer_id in range(len(outputs.attentions)): # 每一层 layer
        for head_id in range(outputs.attentions[attentions_layer_id].shape[1]): # 每一层的每一个 head
            if [attentions_layer_id, head_id] in copy_heads: # 只选择 copy heads 中的 head
                layer_head = (attentions_layer_id, head_id)
                attention_score = outputs.attentions[attentions_layer_id][0,head_id,:,:]
                
                p_span_score_dict = []
                for p_span in prompt_spans:
                    # 在注意力得分矩阵中，取 response 片段对应的那些行和 prompt 片段对应的那些列组成的矩阵
                    # 矩阵求元素总和作为该 reponse 片段关注 prompt 片段的注意力得分
                    p_span_score_dict.append([p_span, torch.sum(attention_score[r_span[0]:r_span[1], p_span[0]:p_span[1]]).cpu().item()])
                # 取出最大的得分对应的 prompt 片段
                p_id = max(range(len(p_span_score_dict)), key=lambda i: p_span_score_dict[i][1])
                # 找到 response 片段 最关注的 prompt 片段和这个 response 片段
                prompt_span_text, respond_span_text = prompt[original_prompt_spans[p_id][0]:original_prompt_spans[p_id][1]], response_rag[original_response_spans[r_id][0]:original_response_spans[r_id][1]]
                # 计算向量相似度
                layer_head_span[str(layer_head)] = calculate_sentence_similarity(prompt_span_text, respond_span_text)
                
print(outputs.attentions[attentions_layer_id].shape)
print(outputs.attentions[attentions_layer_id][:, head_id, :, :].shape)
print(p_span_score_dict)
print(prompt_span_text)
print(respond_span_text)
print(layer_head_span)

torch.Size([1, 32, 1189, 1189])
torch.Size([1, 1189, 1189])
[[[22, 37], 0.11279296875], [[37, 155], 4.30078125], [[151, 275], 1.498046875], [[270, 390], 0.69775390625], [[387, 513], 0.25341796875], [[507, 629], 0.27685546875], [[626, 740], 0.76806640625], [[737, 875], 1.3994140625], [[869, 987], 6.05078125], [[987, 991], 0.0654296875]]
and muscle aches, sometimes associated with diarrhea or other gastrointestinal symptoms. In the United States, an estimated 1,600 people become seriously ill each year, and approximately 16% of these illnesses result in death. Cervical infections caused by listeriosis in pregnant women may result in stillbirth or spontaneous abortion during the second or third trimesters. CNN's Debra Goldschmidt, Amanda Watts and Jacque Wilson contributed to this report.
that served the company's ice cream. Investigations into the possible connection between the ice cream and the infections are ongoing. The company has recalled other products and advises individuals and institutions to check their freezers for the recalled items and throw them away. This is the first product recall in Blue Bell's 108-year history.
{'(1, 14)': 0.7934820652008057, '(1, 21)': 0.5256378650665283, '(1, 27)': 0.7934820652008057, '(2, 5)': 0.5256378650665283, '(2, 22)': 0.5256378650665283, '(3, 0)': 0.5256378650665283, '(3, 19)': 0.5256378650665283, '(5, 13)': 0.5256378650665283, '(5, 29)': 0.5256378650665283, '(10, 20)': 0.5256378650665283, '(13, 20)': 0.5256378650665283, '(15, 7)': 0.5256378650665283, '(16, 1)': 0.7934820652008057, '(18, 9)': 0.7934820652008057, '(18, 10)': 0.7934820652008057, '(18, 13)': 0.5256378650665283, '(19, 1)': 0.5256378650665283, '(20, 1)': 0.7861142158508301, '(20, 5)': 0.7861142158508301, '(20, 15)': 0.5256378650665283, '(20, 17)': 0.7934820652008057, '(20, 22)': 0.5256378650665283, '(22, 10)': 0.5256378650665283, '(23, 8)': 0.7861142158508301, '(23, 30)': 0.5256378650665283, '(25, 0)': 0.7861142158508301, '(27, 9)': 0.5256378650665283, '(28, 18)': 0.5256378650665283, '(31, 18)': 0.5256378650665283, '(31, 19)': 0.5256378650665283, '(31, 24)': 0.7118228077888489, '(31, 28)': 0.5256378650665283}

JS 散度计算函数：

def calculate_dist_2d(sep_vocabulary_dist, sep_attention_dist):
    softmax_mature_layer = F.softmax(sep_vocabulary_dist, dim=-1)  
    softmax_anchor_layer = F.softmax(sep_attention_dist, dim=-1)  

    M = 0.5 * (softmax_mature_layer + softmax_anchor_layer) 

    log_softmax_mature_layer = F.log_softmax(sep_vocabulary_dist, dim=-1)
    log_softmax_anchor_layer = F.log_softmax(sep_attention_dist, dim=-1)

    kl1 = F.kl_div(log_softmax_mature_layer, M, reduction='none').sum(dim=-1)  
    kl2 = F.kl_div(log_softmax_anchor_layer, M, reduction='none').sum(dim=-1)  
    js_divs = 0.5 * (kl1 + kl2)
    
    scores = js_divs.cpu().tolist() # 换成了散度的和
    
    return sum(scores)

幻觉片段判断函数：

def is_hallucination_span(r_span, hallucination_spans):
    for token_id in range(r_span[0], r_span[1]):
        for span in hallucination_spans:
            if token_id >= span[0] and token_id <= span[1]:
                return True
    return False

计算 PKS：

for r_id, r_span in enumerate(respond_spans):
    parameter_knowledge_scores = [calculate_dist_2d(value[0][0,r_span[0]:r_span[1],:], value[1][0,r_span[0]:r_span[1],:]) for value in logits_dict.values()]
    parameter_knowledge_dict = {f"layer_{i}": value for i, value in enumerate(parameter_knowledge_scores)}
    
    span_socre_dict.append({
        "prompt_attention_score":layer_head_span, # 
        "r_span": r_span,
        "hallucination_label": 1 if is_hallucination_span(r_span, hallucination_spans) else 0,
        "parameter_knowledge_scores": parameter_knowledge_dict
    })

parameter_knowledge_dict

{'layer_0': 6.265625,
'layer_1': 11.825347900390625,
'layer_2': 7.304943084716797,
'layer_3': 8.502777099609375,
'layer_4': 8.751604080200195,
'layer_5': 9.063507080078125,
'layer_6': 10.59210205078125,
'layer_7': 10.3037109375,
'layer_8': 9.185203552246094,
'layer_9': 8.613269805908203,
'layer_10': 8.374945402145386,
'layer_11': 6.96978759765625,
'layer_12': 7.036556243896484,
'layer_13': 6.907812118530273,
'layer_14': 6.779157400131226,
'layer_15': 6.831340312957764,
'layer_16': 5.449123382568359,
'layer_17': 7.206271290779114,
'layer_18': 4.6526288986206055,
'layer_19': 5.492448091506958,
'layer_20': 4.489939272403717,
'layer_21': 3.39082270860672,
'layer_22': 2.022574782371521,
'layer_23': 1.9388360977172852,
'layer_24': 4.99117386341095,
'layer_25': 15.26219892501831,
'layer_26': 6.535698175430298,
'layer_27': 6.107271254062653,
'layer_28': 3.5565916895866394,
'layer_29': 1.0493692755699158,
'layer_30': 4.485132694244385,
'layer_31': 5.870414137840271}

结果查看：

response[i]["scores"] = span_socre_dict
select_response.append(response[i])

print(json.dumps(select_response, ensure_ascii=False, indent=4))

[
    {
        "id": "45",
        "source_id": "15599",
        "model": "llama-2-7b-chat",
        "temperature": 0.7,
        "labels": [],
        "split": "test",
        "quality": "good",
        "response": "Blue Bell ice cream has temporarily shut down one of its manufacturing plants after discovering listeria contamination in a serving of ice cream produced at the plant. The Centers for Disease Control and Prevention (CDC) has warned consumers not to eat any Blue Bell-branded products made at the Broken Arrow, Oklahoma plant, including 3-ounce servings of ice cream marked with certain codes. This is the third time Blue Bell has taken action due to a listeria outbreak at a Kansas hospital that served the company's ice cream. Investigations into the possible connection between the ice cream and the infections are ongoing. The company has recalled other products and advises individuals and institutions to check their freezers for the recalled items and throw them away. This is the first product recall in Blue Bell's 108-year history.",
        "response_spans": [
            [
                0,
                506
            ],
            [
                491,
                840
            ]
        ],
        "scores": [
            {
                "prompt_attention_score": {
                    "(1, 14)": 0.7934820652008057,
                    "(1, 21)": 0.5256378650665283,
                    "(1, 27)": 0.7934820652008057,
                    "(2, 5)": 0.5256378650665283,
                    "(2, 22)": 0.5256378650665283,
                    "(3, 0)": 0.5256378650665283,
                    "(3, 19)": 0.5256378650665283,
                    "(5, 13)": 0.5256378650665283,
                    "(5, 29)": 0.5256378650665283,
                    "(10, 20)": 0.5256378650665283,
                    "(13, 20)": 0.5256378650665283,
                    "(15, 7)": 0.5256378650665283,
                    "(16, 1)": 0.7934820652008057,
                    "(18, 9)": 0.7934820652008057,
                    "(18, 10)": 0.7934820652008057,
                    "(18, 13)": 0.5256378650665283,
                    "(19, 1)": 0.5256378650665283,
                    "(20, 1)": 0.7861142158508301,
                    "(20, 5)": 0.7861142158508301,
                    "(20, 15)": 0.5256378650665283,
                    "(20, 17)": 0.7934820652008057,
                    "(20, 22)": 0.5256378650665283,
                    "(22, 10)": 0.5256378650665283,
                    "(23, 8)": 0.7861142158508301,
                    "(23, 30)": 0.5256378650665283,
                    "(25, 0)": 0.7861142158508301,
                    "(27, 9)": 0.5256378650665283,
                    "(28, 18)": 0.5256378650665283,
                    "(31, 18)": 0.5256378650665283,
                    "(31, 19)": 0.5256378650665283,
                    "(31, 24)": 0.7118228077888489,
                    "(31, 28)": 0.5256378650665283
                },
                "r_span": [
                    995,
                    1114
                ],
                "hallucination_label": 0,
                "parameter_knowledge_scores": {
                    "layer_0": 10.039642333984375,
                    "layer_1": 19.6171875,
                    "layer_2": 13.364913940429688,
                    "layer_3": 15.792007446289062,
                    "layer_4": 16.167789459228516,
                    "layer_5": 14.672402381896973,
                    "layer_6": 14.83043384552002,
                    "layer_7": 16.115909576416016,
                    "layer_8": 14.244064331054688,
                    "layer_9": 13.896240234375,
                    "layer_10": 13.285934448242188,
                    "layer_11": 11.34033203125,
                    "layer_12": 10.39111328125,
                    "layer_13": 9.857498168945312,
                    "layer_14": 9.97247314453125,
                    "layer_15": 9.715774536132812,
                    "layer_16": 9.486862182617188,
                    "layer_17": 7.691881537437439,
                    "layer_18": 11.035508871078491,
                    "layer_19": 8.649388074874878,
                    "layer_20": 7.8278902769088745,
                    "layer_21": 4.75305837392807,
                    "layer_22": 4.245644927024841,
                    "layer_23": 6.389559030532837,
                    "layer_24": 15.33685153722763,
                    "layer_25": 2.7393543124198914,
                    "layer_26": 2.4489996433258057,
                    "layer_27": 1.971637487411499,
                    "layer_28": 5.669117510318756,
                    "layer_29": 2.5789473056793213,
                    "layer_30": 7.760132610797882,
                    "layer_31": 6.783948659896851
                }
            },
            {
                "prompt_attention_score": {
                    "(1, 14)": 0.7934820652008057,
                    "(1, 21)": 0.5256378650665283,
                    "(1, 27)": 0.7934820652008057,
                    "(2, 5)": 0.5256378650665283,
                    "(2, 22)": 0.5256378650665283,
                    "(3, 0)": 0.5256378650665283,
                    "(3, 19)": 0.5256378650665283,
                    "(5, 13)": 0.5256378650665283,
                    "(5, 29)": 0.5256378650665283,
                    "(10, 20)": 0.5256378650665283,
                    "(13, 20)": 0.5256378650665283,
                    "(15, 7)": 0.5256378650665283,
                    "(16, 1)": 0.7934820652008057,
                    "(18, 9)": 0.7934820652008057,
                    "(18, 10)": 0.7934820652008057,
                    "(18, 13)": 0.5256378650665283,
                    "(19, 1)": 0.5256378650665283,
                    "(20, 1)": 0.7861142158508301,
                    "(20, 5)": 0.7861142158508301,
                    "(20, 15)": 0.5256378650665283,
                    "(20, 17)": 0.7934820652008057,
                    "(20, 22)": 0.5256378650665283,
                    "(22, 10)": 0.5256378650665283,
                    "(23, 8)": 0.7861142158508301,
                    "(23, 30)": 0.5256378650665283,
                    "(25, 0)": 0.7861142158508301,
                    "(27, 9)": 0.5256378650665283,
                    "(28, 18)": 0.5256378650665283,
                    "(31, 18)": 0.5256378650665283,
                    "(31, 19)": 0.5256378650665283,
                    "(31, 24)": 0.7118228077888489,
                    "(31, 28)": 0.5256378650665283
                },
                "r_span": [
                    1112,
                    1189
                ],
                "hallucination_label": 0,
                "parameter_knowledge_scores": {
                    "layer_0": 6.265625,
                    "layer_1": 11.825347900390625,
                    "layer_2": 7.304943084716797,
                    "layer_3": 8.502777099609375,
                    "layer_4": 8.751604080200195,
                    "layer_5": 9.063507080078125,
                    "layer_6": 10.59210205078125,
                    "layer_7": 10.3037109375,
                    "layer_8": 9.185203552246094,
                    "layer_9": 8.613269805908203,
                    "layer_10": 8.374945402145386,
                    "layer_11": 6.96978759765625,
                    "layer_12": 7.036556243896484,
                    "layer_13": 6.907812118530273,
                    "layer_14": 6.779157400131226,
                    "layer_15": 6.831340312957764,
                    "layer_16": 5.449123382568359,
                    "layer_17": 7.206271290779114,
                    "layer_18": 4.6526288986206055,
                    "layer_19": 5.492448091506958,
                    "layer_20": 4.489939272403717,
                    "layer_21": 3.39082270860672,
                    "layer_22": 2.022574782371521,
                    "layer_23": 1.9388360977172852,
                    "layer_24": 4.99117386341095,
                    "layer_25": 15.26219892501831,
                    "layer_26": 6.535698175430298,
                    "layer_27": 6.107271254062653,
                    "layer_28": 3.5565916895866394,
                    "layer_29": 1.0493692755699158,
                    "layer_30": 4.485132694244385,
                    "layer_31": 5.870414137840271
                }
            }
        ]
    }
]

保存结果：

1 2	`with open("../output/llama2_7B_response_chunk.json", "w") as f: json.dump(select_response, f, ensure_ascii=False)`

2.3 regress.py

2.3.1 token 级别

导入必要的包：

import pandas as pd
import json
import argparse
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.metrics import roc_auc_score
from scipy.stats import pearsonr
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import accuracy_score

加载 copy_heads：

topk_head_path = "../dataset/topk_heads.json"

with open(topk_head_path, 'r') as f:
    copy_heads = json.load(f)
sorted_copy_heads = sorted(copy_heads, key=lambda x: (x[0], x[1])) # 按 layer 层和注意力头升序排序

print(sorted_copy_heads)

[[1, 14], [1, 21], [1, 27], [2, 5], [2, 22], [3, 0], [3, 19], [5, 13], [5, 29], [10, 20], [13, 20], [15, 7], [16, 1], [18, 9], [18, 10], [18, 13], [19, 1], [20, 1], [20, 5], [20, 15], [20, 17], [20, 22], [22, 10], [23, 8], [23, 30], [25, 0], [27, 9], [28, 18], [31, 18], [31, 19], [31, 24], [31, 28]]

构建数据集：

def construct_dataframe(file_path, number):
    # 读取数据集
    with open(file_path, "r") as f:
        response = json.load(f)
    print(len(response))
    print(response[0].keys())
    
    # 定义表头：索引、ECS、PKS、幻觉标签
    data_dict = {
        "identifier": [],
        **{f"external_similarity_{k}": [] for k in range(number)},
        **{f"parameter_knowledge_difference_{k}": [] for k in range(number)},
        "hallucination_label": []
    }

    # response_i 表示第 i 个 response，item_j 表示第 j 个 token ID
    for i, resp in enumerate(response): # 遍历每个 response
        if resp["split"] != "test":
            continue
        for j in range(len(resp["external_similarity"])): # 遍历每个 token ID
            data_dict["identifier"].append(f"response_{i}_item_{j}")
            for k in range(number): # 遍历每个具体的得分
                data_dict[f"external_similarity_{k}"].append(resp["external_similarity"][j][k])
                data_dict[f"parameter_knowledge_difference_{k}"].append(resp["parameter_knowledge_difference"][j][k])
            data_dict["hallucination_label"].append(resp["hallucination_label"][j])

    df = pd.DataFrame(data_dict) # 转换为 DataFrame

    print(df["hallucination_label"].value_counts(normalize=True)) # 查看幻觉标签的比例
    return df

# 读取数据
data_path = "../dataset/llama2_7B_response_token.json"
number = 32
df = construct_dataframe(data_path, number) # 构建数据集

450
dict_keys(['id', 'source_id', 'model', 'temperature', 'labels', 'split', 'quality', 'response', 'external_similarity', 'parameter_knowledge_difference', 'hallucination_label'])
hallucination_label
0    0.937957
1    0.062043
Name: proportion, dtype: float64

计算 ECS 和 PKS 的 AUC 和 PCC：

# 计算 ECS、PKS 的 AUC（曲线下面积）和 Pearson 相关系数（PCC）
def calculate_auc_pcc(df, number):
    auc_external_similarity = []
    pearson_external_similarity = []

    auc_parameter_knowledge_difference = []
    pearson_parameter_knowledge_difference = []

    for k in range(number):
        # ECS 和幻觉标签的 AUC 和 PCC：负相关
        auc_ext = roc_auc_score(1 - df['hallucination_label'], df[f'external_similarity_{k}'])
        pearson_ext, _ = pearsonr(df[f'external_similarity_{k}'], 1 - df['hallucination_label'])
        auc_external_similarity.append((auc_ext, f'external_similarity_{k}'))
        pearson_external_similarity.append((pearson_ext, f'external_similarity_{k}'))

        # PKS 和幻觉标签的 AUC 和 PCC：正相关
        auc_param = roc_auc_score(df['hallucination_label'], df[f'parameter_knowledge_difference_{k}'])
        if df[f'parameter_knowledge_difference_{k}'].nunique() == 1: # 检查 PKS 某一列是否所有值都相同
            print(k)
        pearson_param, _ = pearsonr(df[f'parameter_knowledge_difference_{k}'], df['hallucination_label'])
        auc_parameter_knowledge_difference.append((auc_param, f'parameter_knowledge_difference_{k}'))
        pearson_parameter_knowledge_difference.append((pearson_param, f'parameter_knowledge_difference_{k}'))
    
    return auc_external_similarity, auc_parameter_knowledge_difference

# 计算 ECS、PKS 各自和幻觉标签的 AUC
auc_external_similarity, auc_parameter_knowledge_difference = calculate_auc_pcc(df, number)
print(auc_external_similarity)
print(auc_parameter_knowledge_difference)

[(0.5241438817940267, 'external_similarity_0'), (0.5303407378212408, 'external_similarity_1'), (0.5210631524747991, 'external_similarity_2'), (0.5298950697568433, 'external_similarity_3'), (0.5381268749312161, 'external_similarity_4'), (0.5443052544867116, 'external_similarity_5'), (0.5284039282339847, 'external_similarity_6'), (0.5456647839911151, 'external_similarity_7'), (0.5235929010288944, 'external_similarity_8'), (0.54994054648032, 'external_similarity_9'), (0.551097841783863, 'external_similarity_10'), (0.5452417434757976, 'external_similarity_11'), (0.5516958013996187, 'external_similarity_12'), (0.560198923892558, 'external_similarity_13'), (0.5523331302903907, 'external_similarity_14'), (0.5286836290578362, 'external_similarity_15'), (0.5421496701504019, 'external_similarity_16'), (0.5671844186584903, 'external_similarity_17'), (0.5620680003417163, 'external_similarity_18'), (0.5234921142777087, 'external_similarity_19'), (0.5597268861531965, 'external_similarity_20'), (0.5459607978014505, 'external_similarity_21'), (0.5186293371214259, 'external_similarity_22'), (0.5639456416859279, 'external_similarity_23'), (0.5400120719837709, 'external_similarity_24'), (0.5673763213378886, 'external_similarity_25'), (0.5318546025350572, 'external_similarity_26'), (0.5223420844588798, 'external_similarity_27'), (0.5204039479779862, 'external_similarity_28'), (0.5314390022982304, 'external_similarity_29'), (0.5486740941122893, 'external_similarity_30'), (0.5228121170168041, 'external_similarity_31')]
[(0.4957178339023127, 'parameter_knowledge_difference_0'), (0.4910236843698426, 'parameter_knowledge_difference_1'), (0.5019680855806161, 'parameter_knowledge_difference_2'), (0.5017760641733695, 'parameter_knowledge_difference_3'), (0.5065210786948295, 'parameter_knowledge_difference_4'), (0.5304081180737897, 'parameter_knowledge_difference_5'), (0.5378957151086584, 'parameter_knowledge_difference_6'), (0.5357647382182317, 'parameter_knowledge_difference_7'), (0.5370695759592025, 'parameter_knowledge_difference_8'), (0.5203812973429356, 'parameter_knowledge_difference_9'), (0.46947324971983245, 'parameter_knowledge_difference_10'), (0.5191418423684, 'parameter_knowledge_difference_11'), (0.5176579289529697, 'parameter_knowledge_difference_12'), (0.5108610113873528, 'parameter_knowledge_difference_13'), (0.5318679407293433, 'parameter_knowledge_difference_14'), (0.523121552570652, 'parameter_knowledge_difference_15'), (0.5637065413888521, 'parameter_knowledge_difference_16'), (0.5603452361775584, 'parameter_knowledge_difference_17'), (0.5589974486374809, 'parameter_knowledge_difference_18'), (0.5465907018043785, 'parameter_knowledge_difference_19'), (0.5557532354685331, 'parameter_knowledge_difference_20'), (0.5648406814639895, 'parameter_knowledge_difference_21'), (0.5478857807758221, 'parameter_knowledge_difference_22'), (0.5692361568028578, 'parameter_knowledge_difference_23'), (0.5659032649245824, 'parameter_knowledge_difference_24'), (0.5641498623795829, 'parameter_knowledge_difference_25'), (0.5503712647298249, 'parameter_knowledge_difference_26'), (0.5519855226339968, 'parameter_knowledge_difference_27'), (0.559005634263019, 'parameter_knowledge_difference_28'), (0.5528760122772071, 'parameter_knowledge_difference_29'), (0.5467555938982988, 'parameter_knowledge_difference_30'), (0.5321519256497191, 'parameter_knowledge_difference_31')]

计算 response 的 AUC 和 PCC：

# 计算 response 的 AUC 和 PCC
def calculate_auc_pcc_32_32(df, top_n, top_k, alpha, auc_external_similarity, auc_parameter_knowledge_difference, m=1):
    collect_info = {}
    # 选择 ECS 的 AUC 分数最高的 top_n 个特征
    top_auc_external_similarity = sorted(auc_external_similarity, reverse=True)[:top_n]
    print(top_auc_external_similarity)
    collect_info.update({"select_heads": [sorted_copy_heads[eval(name.split('_')[-1])] for _, name in top_auc_external_similarity]})

    # 选择 PKS 的 AUC 分数最高的 top_k 个特征
    top_auc_parameter_knowledge_difference = sorted(auc_parameter_knowledge_difference, reverse=True)[:top_k]
    print(top_auc_parameter_knowledge_difference)
    base_layer = 0
    collect_info.update({"select_layers": [eval(name.split('_')[-1]) + base_layer for _, name in top_auc_parameter_knowledge_difference]})

    # 对于选择好的特征，求其对应 df 列的和，表示为 ECS 和 PKS 的和
    df['external_similarity_sum'] = df[[col for _, col in top_auc_external_similarity]].sum(axis=1)
    df['parameter_knowledge_difference_sum'] = df[[col for _, col in top_auc_parameter_knowledge_difference]].sum(axis=1)

    # 计算 ECS、PKS 和的 AUC
    final_auc_external_similarity = roc_auc_score(1 - df['hallucination_label'], df['external_similarity_sum'])
    final_auc_parameter_knowledge_difference = roc_auc_score(df['hallucination_label'], df['parameter_knowledge_difference_sum'])

    # 计算 ECS、PKS 和的 PCC
    final_pearson_external_similarity, _ = pearsonr(df['external_similarity_sum'], 1 - df['hallucination_label'])
    final_pearson_parameter_knowledge_difference, _ = pearsonr(df['parameter_knowledge_difference_sum'], df['hallucination_label'])

    # 存放结果
    results = {
        f"Top {top_n} AUC External Similarity": final_auc_external_similarity,
        f"Top {top_n} AUC Parameter Knowledge Difference": final_auc_parameter_knowledge_difference,
        f"Top {top_k} Pearson Correlation External Similarity": final_pearson_external_similarity,
        f"Top {top_k} Pearson Correlation Parameter Knowledge Difference": final_pearson_parameter_knowledge_difference
    }

    # 最小最大归一化
    scaler = MinMaxScaler()

    # 归一化 ECS 和的列
    df['external_similarity_sum_normalized'] = scaler.fit_transform(df[['external_similarity_sum']])
    external_similarity_sum_max_value = scaler.data_max_[0]
    external_similarity_sum_min_value = scaler.data_min_[0]
    collect_info.update({"head_max_min": [external_similarity_sum_max_value, external_similarity_sum_min_value]})
    
    # 归一化 PKS 和的列
    df['parameter_knowledge_difference_sum_normalized'] = scaler.fit_transform(df[['parameter_knowledge_difference_sum']])
    parameter_knowledge_sum_max_value = scaler.data_max_[0]
    parameter_knowledge_sum_min_value = scaler.data_min_[0]
    collect_info.update({"layers_max_min": [parameter_knowledge_sum_max_value, parameter_knowledge_sum_min_value]})

    # 线性拟合 ECS 和 PKS 为 difference_normalized
    df['difference_normalized'] = m * df['parameter_knowledge_difference_sum_normalized'] - alpha * df['external_similarity_sum_normalized']

    # 计算 difference_normalized 的 AUC 和 PCC
    auc_difference_normalized = roc_auc_score(df['hallucination_label'], df['difference_normalized'])
    person_difference_normalized, _ = pearsonr(df['hallucination_label'], df['difference_normalized'])
    results.update({"Normalized Difference AUC": auc_difference_normalized})
    results.update({"Normalized Difference Pearson Correlation": person_difference_normalized})

    # 将 token 级别的预测结果转换为 response 级别的评估
    df['response_group'] = df['identifier'].str.extract(r'(response_\d+)') # 只区分 response，忽略 token
    grouped_df = df.groupby('response_group').agg( # 按 response_group 分组，对每组内的数据计算聚合统计值
        difference_normalized_mean=('difference_normalized', 'mean'), # 计算 difference_normalized 的均值
        hallucination_label=('hallucination_label', 'max') # 有一个幻觉 token 就表明是幻觉 response
    ).reset_index()
    
    # 进行归一化
    min_val = grouped_df['difference_normalized_mean'].min()
    max_val = grouped_df['difference_normalized_mean'].max()
    collect_info.update({'final_max_min': [max_val, min_val]})
    grouped_df['difference_normalized_mean_norm'] = (grouped_df['difference_normalized_mean'] - min_val) / (max_val - min_val)

    # 计算 response 的 AUC 和 PCC
    auc_difference_normalized = roc_auc_score(grouped_df['hallucination_label'], grouped_df['difference_normalized_mean_norm'])
    person_difference_normalized, _ = pearsonr(grouped_df['hallucination_label'], grouped_df['difference_normalized_mean_norm'])

    results.update({"Grouped means AUC": auc_difference_normalized})
    results.update({"Grouped means Pearson Correlation": person_difference_normalized})
    
    print(collect_info)
    print(results)
    print(df.iloc[:, 66:])
    print(grouped_df)
    
    return auc_difference_normalized, person_difference_normalized

# i：AUC 最高的前 i 个 ECS；j：AUC 最高的前 j 个 PKS
# k：ECS 的权重衰减系数 alpha；m：PKS 的权重系数
i, j, k, m = 1, 10, 0.2, 1
auc_difference_normalized, person_difference_normalized = calculate_auc_pcc_32_32(df, i, j, k, auc_external_similarity, auc_parameter_knowledge_difference, m)

[(0.5673763213378886, 'external_similarity_25')]
[(0.5692361568028578, 'parameter_knowledge_difference_23'), (0.5659032649245824, 'parameter_knowledge_difference_24'), (0.5648406814639895, 'parameter_knowledge_difference_21'), (0.5641498623795829, 'parameter_knowledge_difference_25'), (0.5637065413888521, 'parameter_knowledge_difference_16'), (0.5603452361775584, 'parameter_knowledge_difference_17'), (0.559005634263019, 'parameter_knowledge_difference_28'), (0.5589974486374809, 'parameter_knowledge_difference_18'), (0.5557532354685331, 'parameter_knowledge_difference_20'), (0.5528760122772071, 'parameter_knowledge_difference_29')]
{'select_heads': [[25, 0]], 'select_layers': [23, 24, 21, 25, 16, 17, 28, 18, 20, 29], 'head_max_min': [0.70703125, -0.06622314453125], 'layers_max_min': [403.1658172607422, 0.0], 'final_max_min': [0.019226928463994836, -0.0883788238921643]}
{'Top 1 AUC External Similarity': 0.5673763213378886, 'Top 1 AUC Parameter Knowledge Difference': 0.583095315694985, 'Top 10 Pearson Correlation External Similarity': 0.05559519158826022, 'Top 10 Pearson Correlation Parameter Knowledge Difference': 0.041942949285387915, 'Normalized Difference AUC': 0.5923123608321358, 'Normalized Difference Pearson Correlation': 0.058753678172965486, 'Grouped means AUC': 0.732498419721871, 'Grouped means Pearson Correlation': 0.39790584030340576}
    external_similarity_sum  parameter_knowledge_difference_sum  \
0                     0.315674                           10.013580   
1                     0.219116                           18.298626   
2                     0.395020                           12.934208   
3                     0.325684                            9.894371   
4                     0.328369                            1.430511   
...                        ...                                 ...   
88401                 0.264160                           10.788441   
88402                 0.242554                           16.868114   
88403                 0.427490                            4.231930   
88404                 0.212402                            6.914139   
88405                 0.335205                           40.113926   

    external_similarity_sum_normalized  \
0                                0.493883   
1                                0.369011   
2                                0.596495   
3                                0.506828   
4                                0.510301   
...                                   ...   
88401                            0.427263   
88402                            0.399321   
88403                            0.638488   
88404                            0.360328   
88405                            0.519141   

    parameter_knowledge_difference_sum_normalized  difference_normalized  \
0                                           0.024837              -0.073939   
1                                           0.045387              -0.028415   
2                                           0.032082              -0.087217   
3                                           0.024542              -0.076824   
4                                           0.003548              -0.098512   
...                                              ...                    ...   
88401                                       0.026759              -0.058693   
88402                                       0.041839              -0.038025   
88403                                       0.010497              -0.117201   
88404                                       0.017150              -0.054916   
88405                                       0.099497              -0.004331   

    response_group  
0         response_0  
1         response_0  
2         response_0  
3         response_0  
4         response_0  
...              ...  
88401   response_449  
88402   response_449  
88403   response_449  
88404   response_449  
88405   response_449  

[88406 rows x 6 columns]
    response_group  difference_normalized_mean  hallucination_label  \
0       response_0                   -0.050396                    0   
1       response_1                   -0.036515                    0   
2      response_10                   -0.028513                    0   
3     response_100                   -0.035463                    0   
4     response_101                   -0.043027                    0   
..             ...                         ...                  ...   
445    response_95                   -0.048167                    1   
446    response_96                    0.005520                    0   
447    response_97                   -0.041191                    0   
448    response_98                   -0.027022                    1   
449    response_99                   -0.025304                    1   

    difference_normalized_mean_norm  
0                           0.352977  
1                           0.481976  
2                           0.556339  
3                           0.491755  
4                           0.421465  
..                               ...  
445                         0.373694  
446                         0.872623  
447                         0.438529  
448                         0.570205  
449                         0.586166  

[450 rows x 4 columns]

查看结果：

1 2	`result_dict = {"auc": auc_difference_normalized, "pcc": person_difference_normalized} print(result_dict)`

{'auc': 0.732498419721871, 'pcc': 0.39790584030340576}

保存结果：

1
2
3

save_path = "../output/ReDeEP_token.json"
with open(save_path, 'w') as f:
    json.dump(result_dict, f, ensure_ascii=False)

2.3.2 chunk 级别

导入必要的包：

import pandas as pd
import json
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.metrics import roc_auc_score
from scipy.stats import pearsonr
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import accuracy_score

加载 source_info：

source_info_path = "../dataset/source_info_chunk.jsonl"
source_info_dict = {}

with open(source_info_path, 'r') as f:
    for line in f:
        data = json.loads(line)
        source_info_dict[data['source_id']] = data
        
print(len(source_info_dict))

构建数据集：

def construct_dataframe(file_path, number):
    with open(file_path, "r") as f:
        response = json.load(f)  

    data_dict = {
        "identifier": [],
        "type":[], # 任务类型
        **{f"external_similarity_{k}": [] for k in range(number)},
        **{f"parameter_knowledge_difference_{k}": [] for k in range(number)},
        "hallucination_label": []
    }
  
    for i, resp in enumerate(response): # 遍历每个 response
        if resp["split"] != "test":
            continue
        respond_ids = resp["source_id"]
        rep_type = source_info_dict[respond_ids]["task_type"] # 获取任务类型

        for j in range(len(resp["scores"])): # 遍历每个 response 片段
            data_dict["identifier"].append(f"response_{i}_item_{j}")
            data_dict["type"].append(rep_type)
            for k in range(number):
                data_dict[f"external_similarity_{k}"].append(list(resp["scores"][j]["prompt_attention_score"].values())[k])
                data_dict[f"parameter_knowledge_difference_{k}"].append(list(resp["scores"][j]["parameter_knowledge_scores"].values())[k])
            data_dict["hallucination_label"].append(resp["scores"][j]["hallucination_label"])
        if i == len(response)-1: # 记录 copy_heads 和 layers
            ext_map_dict = {f"external_similarity_{k}":list(resp["scores"][j]["prompt_attention_score"].keys())[k] for k in range(number)}
            para_map_dict = {f"parameter_knowledge_difference_{k}":list(resp["scores"][j]["parameter_knowledge_scores"].keys())[k] for k in range(number)}

    df = pd.DataFrame(data_dict)

    print(df["hallucination_label"].value_counts(normalize=True))  # 查看幻觉标签的比例
    return df, ext_map_dict, para_map_dict

data_path = "../dataset/llama2_7B_response_chunk.json"
number = 32
df, ext_map_dict, para_map_dict = construct_dataframe(data_path, number)
print(ext_map_dict)
print(para_map_dict)

hallucination_label
0    0.748921
1    0.251079
Name: proportion, dtype: float64
{'external_similarity_0': '(1, 14)', 'external_similarity_1': '(1, 21)', 'external_similarity_2': '(1, 27)', 'external_similarity_3': '(2, 5)', 'external_similarity_4': '(2, 22)', 'external_similarity_5': '(3, 0)', 'external_similarity_6': '(3, 19)', 'external_similarity_7': '(5, 13)', 'external_similarity_8': '(5, 29)', 'external_similarity_9': '(10, 20)', 'external_similarity_10': '(13, 20)', 'external_similarity_11': '(15, 7)', 'external_similarity_12': '(16, 1)', 'external_similarity_13': '(18, 9)', 'external_similarity_14': '(18, 10)', 'external_similarity_15': '(18, 13)', 'external_similarity_16': '(19, 1)', 'external_similarity_17': '(20, 1)', 'external_similarity_18': '(20, 5)', 'external_similarity_19': '(20, 15)', 'external_similarity_20': '(20, 17)', 'external_similarity_21': '(20, 22)', 'external_similarity_22': '(22, 10)', 'external_similarity_23': '(23, 8)', 'external_similarity_24': '(23, 30)', 'external_similarity_25': '(25, 0)', 'external_similarity_26': '(27, 9)', 'external_similarity_27': '(28, 18)', 'external_similarity_28': '(31, 18)', 'external_similarity_29': '(31, 19)', 'external_similarity_30': '(31, 24)', 'external_similarity_31': '(31, 28)'}
{'parameter_knowledge_difference_0': 'layer_0', 'parameter_knowledge_difference_1': 'layer_1', 'parameter_knowledge_difference_2': 'layer_2', 'parameter_knowledge_difference_3': 'layer_3', 'parameter_knowledge_difference_4': 'layer_4', 'parameter_knowledge_difference_5': 'layer_5', 'parameter_knowledge_difference_6': 'layer_6', 'parameter_knowledge_difference_7': 'layer_7', 'parameter_knowledge_difference_8': 'layer_8', 'parameter_knowledge_difference_9': 'layer_9', 'parameter_knowledge_difference_10': 'layer_10', 'parameter_knowledge_difference_11': 'layer_11', 'parameter_knowledge_difference_12': 'layer_12', 'parameter_knowledge_difference_13': 'layer_13', 'parameter_knowledge_difference_14': 'layer_14', 'parameter_knowledge_difference_15': 'layer_15', 'parameter_knowledge_difference_16': 'layer_16', 'parameter_knowledge_difference_17': 'layer_17', 'parameter_knowledge_difference_18': 'layer_18', 'parameter_knowledge_difference_19': 'layer_19', 'parameter_knowledge_difference_20': 'layer_20', 'parameter_knowledge_difference_21': 'layer_21', 'parameter_knowledge_difference_22': 'layer_22', 'parameter_knowledge_difference_23': 'layer_23', 'parameter_knowledge_difference_24': 'layer_24', 'parameter_knowledge_difference_25': 'layer_25', 'parameter_knowledge_difference_26': 'layer_26', 'parameter_knowledge_difference_27': 'layer_27', 'parameter_knowledge_difference_28': 'layer_28', 'parameter_knowledge_difference_29': 'layer_29', 'parameter_knowledge_difference_30': 'layer_30', 'parameter_knowledge_difference_31': 'layer_31'}

计算 ECS 和 PKS 的 AUC 和 PCC：

def calculate_auc_pcc(df, ext_map_dict, para_map_dict, number):
    auc_external_similarity = []
    pearson_external_similarity = []

    auc_parameter_knowledge_difference = []
    pearson_parameter_knowledge_difference = []

    for k in range(number):
        auc_ext = roc_auc_score(1 - df['hallucination_label'], df[f'external_similarity_{k}'])
        pearson_ext, _ = pearsonr(df[f'external_similarity_{k}'], 1 - df['hallucination_label'])
        auc_external_similarity.append((auc_ext, f'external_similarity_{k}'))
        pearson_external_similarity.append((pearson_ext, f'external_similarity_{k}'))

        auc_param = roc_auc_score(df['hallucination_label'], df[f'parameter_knowledge_difference_{k}'])
        if df[f'parameter_knowledge_difference_{k}'].nunique() == 1:
            print(k)
        pearson_param, _ = pearsonr(df[f'parameter_knowledge_difference_{k}'], df['hallucination_label'])
        auc_parameter_knowledge_difference.append((auc_param, f'parameter_knowledge_difference_{k}'))
        pearson_parameter_knowledge_difference.append((pearson_param, f'parameter_knowledge_difference_{k}'))
        
        # 把表头换成对应的 copy_heads 或 layers
        auc_external_similarity_rename = [[a, ext_map_dict[k]] for a, k in auc_external_similarity]
        auc_parameter_knowledge_difference_rename = [[a, para_map_dict[k]] for a, k in auc_parameter_knowledge_difference]
    
    return auc_external_similarity, auc_external_similarity_rename, auc_parameter_knowledge_difference, auc_parameter_knowledge_difference_rename

auc_external_similarity, _, auc_parameter_knowledge_difference, _ = calculate_auc_pcc(df, ext_map_dict, para_map_dict, number)
print(auc_external_similarity)
print(auc_parameter_knowledge_difference)

[(0.5417913756789713, 'external_similarity_0'), (0.5143078847767907, 'external_similarity_1'), (0.5400969167181339, 'external_similarity_2'), (0.5450100559013096, 'external_similarity_3'), (0.5260186548846342, 'external_similarity_4'), (0.5401800560596703, 'external_similarity_5'), (0.5466134574880834, 'external_similarity_6'), (0.5124511061491441, 'external_similarity_7'), (0.4991488115033177, 'external_similarity_8'), (0.523128573012178, 'external_similarity_9'), (0.5573978177902355, 'external_similarity_10'), (0.530745720303419, 'external_similarity_11'), (0.5855345463759165, 'external_similarity_12'), (0.5388973347902513, 'external_similarity_13'), (0.5480901705544206, 'external_similarity_14'), (0.5474488099197111, 'external_similarity_15'), (0.5465342771628106, 'external_similarity_16'), (0.5572513341884808, 'external_similarity_17'), (0.5647813831219219, 'external_similarity_18'), (0.608108857111185, 'external_similarity_19'), (0.5305081793276007, 'external_similarity_20'), (0.5807520547294408, 'external_similarity_21'), (0.5671765879614233, 'external_similarity_22'), (0.5516018179802683, 'external_similarity_23'), (0.4836096726685353, 'external_similarity_24'), (0.531842367808447, 'external_similarity_25'), (0.5856137267011893, 'external_similarity_26'), (0.5822010546819326, 'external_similarity_27'), (0.6184379305430188, 'external_similarity_28'), (0.5941652018306491, 'external_similarity_29'), (0.6002462508115983, 'external_similarity_30'), (0.586021505376344, 'external_similarity_31')]
[(0.6850483791787416, 'parameter_knowledge_difference_0'), (0.6821503792737581, 'parameter_knowledge_difference_1'), (0.6848246947598461, 'parameter_knowledge_difference_2'), (0.6746480434541626, 'parameter_knowledge_difference_3'), (0.675837727841386, 'parameter_knowledge_difference_4'), (0.6899852724594993, 'parameter_knowledge_difference_5'), (0.6916282642089093, 'parameter_knowledge_difference_6'), (0.6975430345067857, 'parameter_knowledge_difference_7'), (0.6916005510950639, 'parameter_knowledge_difference_8'), (0.6826294202416584, 'parameter_knowledge_difference_9'), (0.6991484156016913, 'parameter_knowledge_difference_10'), (0.6927130346651464, 'parameter_knowledge_difference_11'), (0.6973252886122856, 'parameter_knowledge_difference_12'), (0.7113124930717214, 'parameter_knowledge_difference_13'), (0.7076820751579647, 'parameter_knowledge_difference_14'), (0.704795952301772, 'parameter_knowledge_difference_15'), (0.7249156729535846, 'parameter_knowledge_difference_16'), (0.732604082537571, 'parameter_knowledge_difference_17'), (0.7431350657988504, 'parameter_knowledge_difference_18'), (0.6827204776157221, 'parameter_knowledge_difference_19'), (0.7435863936529051, 'parameter_knowledge_difference_20'), (0.7425926805707317, 'parameter_knowledge_difference_21'), (0.7210318779989547, 'parameter_knowledge_difference_22'), (0.7266378450282674, 'parameter_knowledge_difference_23'), (0.7004766655581421, 'parameter_knowledge_difference_24'), (0.6884056249703074, 'parameter_knowledge_difference_25'), (0.6712710025812786, 'parameter_knowledge_difference_26'), (0.6899971495082903, 'parameter_knowledge_difference_27'), (0.686271715204206, 'parameter_knowledge_difference_28'), (0.7507324180087731, 'parameter_knowledge_difference_29'), (0.7555505408016215, 'parameter_knowledge_difference_30'), (0.691774747810664, 'parameter_knowledge_difference_31')]

计算 response 的 AUC 和 PCC：

def calculate_auc_pcc_32_32(df, top_n, top_k, alpha, auc_external_similarity, auc_parameter_knowledge_difference, m=1):
    top_auc_external_similarity = sorted(auc_external_similarity, reverse=True)[:top_n]
    print(top_auc_external_similarity)
    
    top_auc_parameter_knowledge_difference = sorted(auc_parameter_knowledge_difference, reverse=True)[:top_k]
    print(top_auc_parameter_knowledge_difference)
    
    df['external_similarity_sum'] = df[[col for _, col in top_auc_external_similarity]].sum(axis=1)
    df['parameter_knowledge_difference_sum'] = df[[col for _, col in top_auc_parameter_knowledge_difference]].sum(axis=1)

    final_auc_external_similarity = roc_auc_score(1 - df['hallucination_label'], df['external_similarity_sum'])
    final_auc_parameter_knowledge_difference = roc_auc_score(df['hallucination_label'], df['parameter_knowledge_difference_sum'])

    final_pearson_external_similarity, _ = pearsonr(df['external_similarity_sum'], 1 - df['hallucination_label'])
    final_pearson_parameter_knowledge_difference, _ = pearsonr(df['parameter_knowledge_difference_sum'], df['hallucination_label'])

    results = {
        f"Top {top_n} AUC External Similarity": final_auc_external_similarity,
        f"Top {top_k} N AUC Parameter Knowledge Difference": final_auc_parameter_knowledge_difference,
        f"Top {top_n} Pearson Correlation External Similarity": final_pearson_external_similarity,
        f"Top {top_k} Pearson Correlation Parameter Knowledge Difference": final_pearson_parameter_knowledge_difference
    }

    scaler = MinMaxScaler()
    
    df['external_similarity_sum_normalized'] = scaler.fit_transform(df[['external_similarity_sum']])
    
    df['parameter_knowledge_difference_sum_normalized'] = scaler.fit_transform(df[['parameter_knowledge_difference_sum']])

    df['difference_normalized'] = m * df['parameter_knowledge_difference_sum_normalized'] - alpha * df['external_similarity_sum_normalized']

    auc_difference_normalized = roc_auc_score(df['hallucination_label'], df['difference_normalized'])
    person_difference_normalized, _ = pearsonr(df['hallucination_label'], df['difference_normalized'])
    
    results.update({"Normalized Difference AUC": auc_difference_normalized})
    results.update({"Normalized Difference Pearson Correlation": person_difference_normalized})

    df['response_group'] = df['identifier'].str.extract(r'(response_\d+)')
    grouped_df = df.groupby('response_group').agg(
        difference_normalized_mean=('difference_normalized', 'mean'),
        hallucination_label=('hallucination_label', 'max'),
        resp_type=('type', 'first')
    ).reset_index()
    
    min_val = grouped_df['difference_normalized_mean'].min()
    max_val = grouped_df['difference_normalized_mean'].max()
    grouped_df['difference_normalized_mean_norm'] = (grouped_df['difference_normalized_mean'] - min_val) / (max_val - min_val)

    auc_difference_normalized = roc_auc_score(grouped_df['hallucination_label'], grouped_df['difference_normalized_mean_norm'])
    person_difference_normalized, _ = pearsonr(grouped_df['hallucination_label'], grouped_df['difference_normalized_mean_norm'])

    results.update({"Grouped means AUC": auc_difference_normalized})
    results.update({"Grouped means Pearson Correlation": person_difference_normalized})

    print(results)
    print(df.iloc[:, 67:])
    print(grouped_df)

    return auc_difference_normalized, person_difference_normalized

i, j, k, m = 3, 4, 0.6, 1
auc_difference_normalized, person_difference_normalized = calculate_auc_pcc_32_32(df, i, j, k, auc_external_similarity, auc_parameter_knowledge_difference, m)

[(0.6184379305430188, 'external_similarity_28'), (0.608108857111185, 'external_similarity_19'), (0.6002462508115983, 'external_similarity_30')]
[(0.7555505408016215, 'parameter_knowledge_difference_30'), (0.7507324180087731, 'parameter_knowledge_difference_29'), (0.7435863936529051, 'parameter_knowledge_difference_20'), (0.7431350657988504, 'parameter_knowledge_difference_18')]
{'Top 3 AUC External Similarity': 0.6122024799277875, 'Top 4 N AUC Parameter Knowledge Difference': 0.7696604747652304, 'Top 3 Pearson Correlation External Similarity': 0.1803052828551513, 'Top 4 Pearson Correlation Parameter Knowledge Difference': 0.41798093066978415, 'Normalized Difference AUC': 0.7716399828970497, 'Normalized Difference Pearson Correlation': 0.4246730120824572, 'Grouped means AUC': 0.747451801517067, 'Grouped means Pearson Correlation': 0.42077140374980926}
    external_similarity_sum  parameter_knowledge_difference_sum  \
0                    2.803634                           25.189286   
1                    2.527066                           29.186299   
2                    1.762983                           14.667820   
3                    2.516881                           30.397039   
4                    1.768611                           39.194237   
...                       ...                                 ...   
1154                 2.443222                           28.610018   
1155                 2.540920                            4.499433   
1156                 2.376156                           26.957270   
1157                 2.114817                            6.117686   
1158                 2.275002                            7.565648   

    external_similarity_sum_normalized  \
0                               0.915263   
1                               0.766601   
2                               0.355887   
3                               0.761126   
4                               0.358912   
...                                  ...   
1154                            0.721533   
1155                            0.774048   
1156                            0.685483   
1157                            0.545007   
1158                            0.631110   

    parameter_knowledge_difference_sum_normalized  difference_normalized  \
0                                          0.349987              -0.199171   
1                                          0.405634              -0.054326   
2                                          0.203504              -0.010028   
3                                          0.422491              -0.034185   
4                                          0.544968               0.329620   
...                                             ...                    ...   
1154                                       0.397611              -0.035308   
1155                                       0.061937              -0.402491   
1156                                       0.374601              -0.036689   
1157                                       0.084467              -0.242537   
1158                                       0.104626              -0.274040   

    response_group  
0        response_0  
1        response_1  
2        response_1  
3        response_2  
4        response_2  
...             ...  
1154   response_448  
1155   response_449  
1156   response_449  
1157   response_449  
1158   response_449  

[1159 rows x 6 columns]
    response_group  difference_normalized_mean  hallucination_label resp_type  \
0       response_0                   -0.199171                    0   Summary   
1       response_1                   -0.032177                    0   Summary   
2      response_10                   -0.053839                    0   Summary   
3     response_100                   -0.064660                    0   Summary   
4     response_101                   -0.007987                    0   Summary   
..             ...                         ...                  ...       ...   
445    response_95                    0.040043                    1   Summary   
446    response_96                   -0.003845                    0   Summary   
447    response_97                   -0.021714                    0   Summary   
448    response_98                    0.045525                    1   Summary   
449    response_99                   -0.075167                    1   Summary   

    difference_normalized_mean_norm  
0                           0.153055  
1                           0.311814  
2                           0.291220  
3                           0.280932  
4                           0.334811  
..                               ...  
445                         0.380473  
446                         0.338749  
447                         0.321761  
448                         0.385684  
449                         0.270944  

[450 rows x 5 columns]

查看结果：

1 2	`result_dict = {"auc":auc_difference_normalized, "pcc": person_difference_normalized} print(result_dict)`

{'auc': 0.747451801517067, 'pcc': 0.42077140374980926}

保存结果：

1
2
3

save_path = "../output/ReDeEP_chunk.json"
with open(save_path, 'w') as f:
    json.dump(result_dict, f, ensure_ascii=False)

2.4 AARF.py

导入必要的包：

import sys
sys.path.insert(0, '../transformers/src')
import torch
import json
from tqdm import tqdm
from transformers import AutoTokenizer, AutoModelForCausalLM

加载 source_info：

source_info_path = "../dataset/source_info.jsonl"
source_info_dict = {}

with open(source_info_path, 'r') as f:
    for line in f:
        data = json.loads(line)
        source_info_dict[data['source_id']] = data
        
print(json.dumps(source_info_dict, ensure_ascii=False, indent=4))

{
    "15596": {
        "source_id": "15596",
        "task_type": "Summary",
        "source": "CNN/DM",
        "source_info": "The FBI charged a Philadelphia woman on Thursday with trying to travel overseas to fight for ISIS. She's one of three women arrested this week on terror charges. Two New York women were also taken into custody. An FBI complaint cites numerous social media messages dating back to August 2013 that were sent by Keonna Thomas, 30, also known as \"Young Lioness\" and \"Fatayat Al Khilafah.\" One Twitter message said, \"If we truly knew the realities ... we all would be rushing to join our brothers in the front lines pray ALLAH accept us as shuhada [martyrs].\" Another said, \"When you're a mujahid [violent jihadi fighter] your death becomes a wedding.\" The FBI said Thomas purchased an electronic visa to Turkey on March 23. Turkey is known as the easiest place from which to enter Syria and join ISIS. An ISIS manual advises recruits to buy round-trip tickets to vacation spots such as Spain and then purchase tickets for their real destination once they arrive overseas, the FBI said. On March 26, Thomas purchased a ticket to Barcelona, with a March 29 departure and an April 15 return to the United States, the complaint said. It's not clear when or where she was arrested. She was charged with knowingly attempting to provide material support and resources to a designated foreign terrorist organization. She could be sentenced to 15 years in prison. On Thursday, Noelle Velentzas, 28, and her former roommate, Asia Siddiqui, 31, were arrested in New York and accused of planning to build an explosive device for attacks in the United States, federal prosecutors said. In the past 18 months, the Justice Department's National Security Division has prosecuted or is prosecuting more than 30 cases of people attempting to travel abroad to join or provide support to terrorist groups. Of those cases, 18 allegedly involve support to ISIS. \"The terrorist threat is more decentralized, more diffuse, more complicated,\" Homeland Security Secretary Jeh Johnson told reporters Thursday. \"It involves the potential lone wolf actor, it involves the effective use of social media, the Internet.\"\n",
        "prompt": "Summarize the following news within 86 words:\nThe FBI charged a Philadelphia woman on Thursday with trying to travel overseas to fight for ISIS. She's one of three women arrested this week on terror charges. Two New York women were also taken into custody. An FBI complaint cites numerous social media messages dating back to August 2013 that were sent by Keonna Thomas, 30, also known as \"Young Lioness\" and \"Fatayat Al Khilafah.\" One Twitter message said, \"If we truly knew the realities ... we all would be rushing to join our brothers in the front lines pray ALLAH accept us as shuhada [martyrs].\" Another said, \"When you're a mujahid [violent jihadi fighter] your death becomes a wedding.\" The FBI said Thomas purchased an electronic visa to Turkey on March 23. Turkey is known as the easiest place from which to enter Syria and join ISIS. An ISIS manual advises recruits to buy round-trip tickets to vacation spots such as Spain and then purchase tickets for their real destination once they arrive overseas, the FBI said. On March 26, Thomas purchased a ticket to Barcelona, with a March 29 departure and an April 15 return to the United States, the complaint said. It's not clear when or where she was arrested. She was charged with knowingly attempting to provide material support and resources to a designated foreign terrorist organization. She could be sentenced to 15 years in prison. On Thursday, Noelle Velentzas, 28, and her former roommate, Asia Siddiqui, 31, were arrested in New York and accused of planning to build an explosive device for attacks in the United States, federal prosecutors said. In the past 18 months, the Justice Department's National Security Division has prosecuted or is prosecuting more than 30 cases of people attempting to travel abroad to join or provide support to terrorist groups. Of those cases, 18 allegedly involve support to ISIS. \"The terrorist threat is more decentralized, more diffuse, more complicated,\" Homeland Security Secretary Jeh Johnson told reporters Thursday. \"It involves the potential lone wolf actor, it involves the effective use of social media, the Internet.\"\n\noutput:"
    }
}

获取 reponse 的 source_id：

source_id_list = []

response_path = "../dataset/response.jsonl"

with open(response_path, 'r') as f:
    for line in f:
        data = json.loads(line)
        if data["split"] == "test":
            source_id_list.append(data["source_id"])
            
print(json.dumps(source_id_list, ensure_ascii=False, indent=4))

[
    "15596"
]

获取 source_id 对应的 source_info：

test_datas_dict = {}


source_id_set = sorted(list(set(source_id_list)))

for item in source_id_set:
    test_datas_dict[item] = source_info_dict[item]
    
print(json.dumps(test_datas_dict, ensure_ascii=False, indent=4))

{
    "15596": {
        "source_id": "15596",
        "task_type": "Summary",
        "source": "CNN/DM",
        "source_info": "The FBI charged a Philadelphia woman on Thursday with trying to travel overseas to fight for ISIS. She's one of three women arrested this week on terror charges. Two New York women were also taken into custody. An FBI complaint cites numerous social media messages dating back to August 2013 that were sent by Keonna Thomas, 30, also known as \"Young Lioness\" and \"Fatayat Al Khilafah.\" One Twitter message said, \"If we truly knew the realities ... we all would be rushing to join our brothers in the front lines pray ALLAH accept us as shuhada [martyrs].\" Another said, \"When you're a mujahid [violent jihadi fighter] your death becomes a wedding.\" The FBI said Thomas purchased an electronic visa to Turkey on March 23. Turkey is known as the easiest place from which to enter Syria and join ISIS. An ISIS manual advises recruits to buy round-trip tickets to vacation spots such as Spain and then purchase tickets for their real destination once they arrive overseas, the FBI said. On March 26, Thomas purchased a ticket to Barcelona, with a March 29 departure and an April 15 return to the United States, the complaint said. It's not clear when or where she was arrested. She was charged with knowingly attempting to provide material support and resources to a designated foreign terrorist organization. She could be sentenced to 15 years in prison. On Thursday, Noelle Velentzas, 28, and her former roommate, Asia Siddiqui, 31, were arrested in New York and accused of planning to build an explosive device for attacks in the United States, federal prosecutors said. In the past 18 months, the Justice Department's National Security Division has prosecuted or is prosecuting more than 30 cases of people attempting to travel abroad to join or provide support to terrorist groups. Of those cases, 18 allegedly involve support to ISIS. \"The terrorist threat is more decentralized, more diffuse, more complicated,\" Homeland Security Secretary Jeh Johnson told reporters Thursday. \"It involves the potential lone wolf actor, it involves the effective use of social media, the Internet.\"\n",
        "prompt": "Summarize the following news within 86 words:\nThe FBI charged a Philadelphia woman on Thursday with trying to travel overseas to fight for ISIS. She's one of three women arrested this week on terror charges. Two New York women were also taken into custody. An FBI complaint cites numerous social media messages dating back to August 2013 that were sent by Keonna Thomas, 30, also known as \"Young Lioness\" and \"Fatayat Al Khilafah.\" One Twitter message said, \"If we truly knew the realities ... we all would be rushing to join our brothers in the front lines pray ALLAH accept us as shuhada [martyrs].\" Another said, \"When you're a mujahid [violent jihadi fighter] your death becomes a wedding.\" The FBI said Thomas purchased an electronic visa to Turkey on March 23. Turkey is known as the easiest place from which to enter Syria and join ISIS. An ISIS manual advises recruits to buy round-trip tickets to vacation spots such as Spain and then purchase tickets for their real destination once they arrive overseas, the FBI said. On March 26, Thomas purchased a ticket to Barcelona, with a March 29 departure and an April 15 return to the United States, the complaint said. It's not clear when or where she was arrested. She was charged with knowingly attempting to provide material support and resources to a designated foreign terrorist organization. She could be sentenced to 15 years in prison. On Thursday, Noelle Velentzas, 28, and her former roommate, Asia Siddiqui, 31, were arrested in New York and accused of planning to build an explosive device for attacks in the United States, federal prosecutors said. In the past 18 months, the Justice Department's National Security Division has prosecuted or is prosecuting more than 30 cases of people attempting to travel abroad to join or provide support to terrorist groups. Of those cases, 18 allegedly involve support to ISIS. \"The terrorist threat is more decentralized, more diffuse, more complicated,\" Homeland Security Secretary Jeh Johnson told reporters Thursday. \"It involves the potential lone wolf actor, it involves the effective use of social media, the Internet.\"\n\noutput:"
    }
}

加载超参数：

save_path = "../dataset/token_hyperparameter.json"

with open(save_path, "r") as f:
    hypter_parameter = json.load(f)
    
hypter_parameter

{'select_heads': [[25, 0]],
'select_layers': [23, 24, 21, 25, 16, 17, 28, 18, 20, 29],
'head_max_min': [0.70703125, -0.06622314453125],
'layers_max_min': [403.1658172607422, 0.0],
'final_max_min': [0.019226928463994836, -0.0883788238921643],
'weight': 0.2}

定义变量：

select_layers = hypter_parameter["select_layers"]
select_heads = hypter_parameter["select_heads"]
layers_max_min = hypter_parameter["layers_max_min"]
head_max_min  = hypter_parameter["head_max_min"]
weight = hypter_parameter["weight"]
final_max_min = hypter_parameter["final_max_min"]

data_type = "llama-2-7b-chat"

model_name = "../../model/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)

加载模型：

model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.float16,
        device_map="auto",
        select_layers=select_layers,
        select_heads=select_heads,
        layers_max_min=layers_max_min,
        head_max_min=head_max_min,
        weight=weight,
        final_max_min=final_max_min
    )
model.add_attention_weight = 1.2
model.reduce_ffn_weight = 0.8
model.threshold = 0.6

Loading checkpoint shards: 100%|██████████| 2/2 [00:45<00:00, 22.97s/it]

构造模型输入：

def add_special_template(prompt):
    messages = [
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": prompt}
            ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
    )
    return text

final_datas = []

执行模型推理：

for key, prompt in tqdm(test_datas_dict.items()):
    text = add_special_template(prompt["prompt"][:8000])
    input_ids = tokenizer(text, return_tensors="pt").input_ids.to("cuda")
    model.prefix_len = input_ids.shape[-1]
    print("input_ids", input_ids.shape)
    outputs = model.generate(
        input_ids,
        do_sample=False,
        temperature=None,
        top_p=None,
        max_new_tokens=1024
    )

    response = outputs[0][input_ids.shape[-1]:]
    result = tokenizer.decode(response, skip_special_tokens=True)
    print(result)
    final_datas.append({"id":key, "prompt":prompt["prompt"], "response":result})

0%|          | 0/1 [00:00<?, ?it/s]
input_ids torch.Size([1, 564])
100%|██████████| 1/1 [00:07<00:00,  7.60s/it]
FBI charges Philadelphia woman, Keonna Thomas, with attempting to provide material support to ISIS. She purchased an electronic visa to Turkey and had social media messages expressing desire to join ISIS. Two other women were arrested in New York on similar charges. The FBI has prosecuted or is prosecuting over 30 cases of people attempting to travel abroad to join or provide support to terrorist groups, with 18 involving ISIS.

保存并查看结果：

1
2
3

with open(f"../output/AARF_add_{model.add_attention_weight}_reduce_{model.reduce_ffn_weight}_threshold_{model.threshold}.json", "w") as f:
    json.dump(final_datas, f, indent=4, ensure_ascii=False)
final_datas

[{'id': '15596',
'prompt': 'Summarize the following news within 86 words:\nThe FBI charged a Philadelphia woman on Thursday with trying to travel overseas to fight for ISIS. She\'s one of three women arrested this week on terror charges. Two New York women were also taken into custody. An FBI complaint cites numerous social media messages dating back to August 2013 that were sent by Keonna Thomas, 30, also known as "Young Lioness" and "Fatayat Al Khilafah." One Twitter message said, "If we truly knew the realities ... we all would be rushing to join our brothers in the front lines pray ALLAH accept us as shuhada [martyrs]." Another said, "When you\'re a mujahid [violent jihadi fighter] your death becomes a wedding." The FBI said Thomas purchased an electronic visa to Turkey on March 23. Turkey is known as the easiest place from which to enter Syria and join ISIS. An ISIS manual advises recruits to buy round-trip tickets to vacation spots such as Spain and then purchase tickets for their real destination once they arrive overseas, the FBI said. On March 26, Thomas purchased a ticket to Barcelona, with a March 29 departure and an April 15 return to the United States, the complaint said. It\'s not clear when or where she was arrested. She was charged with knowingly attempting to provide material support and resources to a designated foreign terrorist organization. She could be sentenced to 15 years in prison. On Thursday, Noelle Velentzas, 28, and her former roommate, Asia Siddiqui, 31, were arrested in New York and accused of planning to build an explosive device for attacks in the United States, federal prosecutors said. In the past 18 months, the Justice Department\'s National Security Division has prosecuted or is prosecuting more than 30 cases of people attempting to travel abroad to join or provide support to terrorist groups. Of those cases, 18 allegedly involve support to ISIS. "The terrorist threat is more decentralized, more diffuse, more complicated," Homeland Security Secretary Jeh Johnson told reporters Thursday. "It involves the potential lone wolf actor, it involves the effective use of social media, the Internet."\n\noutput:',
'response': ' FBI charges Philadelphia woman, Keonna Thomas, with attempting to provide material support to ISIS. She purchased an electronic visa to Turkey and had social media messages expressing desire to join ISIS. Two other women were arrested in New York on similar charges. The FBI has prosecuted or is prosecuting over 30 cases of people attempting to travel abroad to join or provide support to terrorist groups, with 18 involving ISIS.'}]

代码复现

#RAG #LLM

【论文复现】ReDeEP

http://xuan-van.github.io/代码复现/【论文复现】redeep/

作者

文晋

发布于

2025年5月27日

许可协议

【论文复现】xRAG 下一篇