引言

在大语言模型不断成熟、应用不断扩大的环境下，如何批量化使用大语言模型成为了一个问题，毕竟当面临着十万字的校对时，再进行人工输入Prompts已不再现实。所以首先要做到的是在输入输出（I/O）上实现模块化和批量化。

LangChain介绍

LangChain是一个用于构建大语言模型应用的工具包。它通过模块化的设计，使用户能够轻松地将各种自然语言处理任务集成到他们的应用程序中。LangChain支持多种语言模型和API，包括OpenAI的GPT-4，使用户能够在不同的任务中灵活使用不同的模型。

LangChain的主要特点包括：

模块化设计：用户可以根据需要选择和组合不同的模块，以实现特定的任务。

批量处理：支持大规模数据处理，使得用户可以高效地处理大量文本数据。

易于集成：提供简洁的API接口，便于与现有系统集成。

在本文中，我们将使用LangChain来实现I/O模块化和批量化，以完成使用GEMBA=MQM来评估翻译质量的任务。通过详细的步骤，我们将展示如何设置环境、调用API以及使用GEMBA-MQM来评估翻译质量。

注：

本文使用Google Colab作为运行环境，也可以使用国内转发API在本地ipynb笔记本上运行。LangChain推荐在ipynb笔记本上运行。
部分代码来源于北京语言大学韩林涛老师课堂讲义，已结合GEMBA-MQM进行优化。

环境配置和安装

步骤一：安装所需的Python包

首先，我们需要安装LangChain和OpenAI的相关Python包。打开终端并运行以下命令：

pip install langchain-openai

步骤二：设置环境变量

接下来，我们需要设置OpenAI的API密钥和基础URL。请将以下代码添加到您的Python脚本中，并替换为您的实际API密钥和URL。

python

import os

os.environ["OPENAI_API_KEY"] = "sk-***"

os.environ["OPENAI_BASE_URL"] = "https://***"

步骤三：初始化ChatOpenAI对象

我们将使用ChatOpenAI类来初始化一个语言模型对象。该对象将用于调用OpenAI的API，并选择GPT-4模型。

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4-turbo")

步骤四：创建ChatPromptTemplate

我们将使用ChatPromptTemplate来创建一个提示模板，用于评估翻译质量。提示模板包含系统消息和用户消息。系统消息定义了语言模型的角色和任务，用户消息包含用户的输入。

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([

("system", "You are an annotator for the quality of machine translation. Your task is to identify errors and assess the quality of the translation with MQM method."),

("user", "{input}")

])

定义数据模型

步骤五：定义输出的结构化数据模型

为了返回结构化数据，我们使用Pydantic库定义一个数据模型TranslationError。这个模型包括包含问题的源语句和目标语翻译语句、错误类型、错误严重程度和解释，此处的提示基于GEMBA-MQM。随后，我们为了输出多个错误，再定义一个数据模型TranslationErrorReport，使用TranslationError构成list。

from typing import Optional

from langchain_core.pydantic_v1 import BaseModel, Field

class TranslationError(BaseModel):

source: str = Field(description="The source sentence of the error")

target: str = Field(description="The target sentence of the error")

error_type: str = Field(description="There are 7 types of error: accuracy (addition, mistranslation, omission, untranslated text), fluency (character encoding, grammar, inconsistency, punctuation, register, spelling), locale convention (currency, date, name, telephone, or time format) style (awkward), terminology (inappropriate for context, inconsistent use), non-translation, other.")

severity: str = Field(description="Each error is classified as one of three severities: critical, major, and minor. Critical errors inhibit comprehension of the text. Major errors disrupt the flow, but what the text is trying to say is still understandable. Minor errors are technically errors, but do not disrupt the flow or hinder comprehension")

reason: str = Field(description="The brief and precise explanation of the error reason based on the error type.")

class TranslationErrorReport(BaseModel):

errors: list[TranslationError] = Field(description="List of translation errors")

生成结构化输出

步骤六：生成结构化输出的语言模型

我们使用with_structured_output方法将上述定义的结构化数据模型应用到语言模型上：

structured_llm = llm.with_structured_output(TranslationErrorReport)

步骤七：组合提示模板和结构化语言模型

将提示模板和结构化语言模型组合在一起，形成一个完整的处理链条

chain = prompt | structured_llm

数据输入和评估

步骤八：输入数据并调用链条进行评估

此处可以再构成一个数据自动输入的程序，以便后期批量化处理。将数据组合成一个完整的用户输入，并调用链条进行评估。

source_language = "中文"

source_text = """中国数字经济空间关联网络结构及其影响因素

数字经济畅通了经济循环，促进了区域经济协调发展。通过测算2006-2019年中国省级数字经济核心产业增加值，利用SNA分析方法对中国省际数字经济空间关联及其影响因素进行了实证考察。"""

target_language = "English"

target_text = """Associative Network Structure of China Digital Economy Space and Its Influence Factors

Digital Economy smooths the cycle of economy and develops the regional economy coordinately. We have done an survey with SAA analyze method on the spatial relation of China’s cross-province digital economy and its influence factors by estimating the added value of core industries of China’s provincial digital economy."""

input_data = source_language + ": " + source_text + "; \n" + target_language + ": " + target_text

response = chain.invoke(input_data)

print(response)

最终返回结果

errors=[

TranslationError(

source='数字经济畅通了经济循环，促进了区域经济协调发展。',

target='Digital Economy smooths the cycle of economy and develops the regional economy coordinately.',

error_type='Mistranslations',

severity='Minor',

reason="The phrase '畅通了经济循环' is translated as 'smooths the cycle of economy', while a more accurate translation would be 'facilitates the flow of the economy'. Similarly, '区域经济协调发展' is translated as 'develops the regional economy coordinately', while a more accurate translation would be 'promotes coordinated regional economic development'."),

TranslationError(

source='通过测算2006-2019年中国省级数字经济核心产业增加值，利用SNA分析方法对中国省际数字经济空间关联及其影响因素进行了实证考察。',

target='We have done an survey with SAA analyze method on the spatial relation of China’s cross-province digital economy and its influence factors by estimating the added value of core industries of China’s provincial digital economy.',

error_type='Omission',

severity='Major',

reason="The acronym 'SNA' is mistranslated as 'SAA'. This can lead to confusion as 'SAA' might not exist or mean something different."

)]

结果反思与总结

从结果中可以看出如下2点问题：

1. 目前GPT的指令遵循仍存在问题，或者提示词仍有优化空间。在error_type中，根据GEMBA-MQM的错误类型定义了七种错误，并使用括号解释错误类型包括的内涵。但是GPT会将内涵作为类型输出，可能理解为这些作为错误类型的细分，实际从效果的角度上并不影响结果。可以考虑格式化定义error_type，使其只能输出列表中的元素。

2. 虽然没有系统选取多维度多层级的翻译文本，但是当前翻译文段出自中国日报的经济文章，在翻译难度和技巧上仍具有可参考性。从质量评估的结果上，其找到了明确的一处术语错误（将SNA翻译为SAA）和一些不流畅的表达，但是也存在漏洞，评估未发现英文文本漏译的重要的时间范围。这说明了即便在GPT-4的加持下，GEMBA-MQM从语言流畅度和表达上效果优秀，但是准确性上仍存在失误。

使用LangChain与GPT进行交互，就如同将系统化的代码与创造性的GPT相结合，使用代码规范GPT，在逻辑流程中可以随时加入可控变量，让GPT输出预期结果，并实现自动化。

结合LangChain和GPT实现LLM在翻译质量评估上的批量化应用

引言