source: Machine learning algorithm full stack engineer

author: Monkang


edit: Wang Shu Wei

This article altogether6050 word, Recommended reading10 Minute.
This paper will help you understand the research methods and future trends of intelligent dialogue system.




In recent research, the author found a very good paper about dialogue system,《A Survey on Dialogue Systems:Recent Advances and
New
Frontiers》, The paper comes from Jingdong data team, The paper quoted recent124 Papers, Is a comprehensive introduction to the dialogue system, It's full of sincerity, Today we're going to focus on that, offer to the reader.




Preface





Having a virtual assistant or a chat partner system with enough intelligence seems to be illusory, And probably only in science fiction movies for a long time. In recent years, More and more researchers pay attention to human-computer conversation because of its potential and attractive commercial value.




With the development of big data and deep learning technology, Create an automated human-computer conversation system as our personal assistant or chat partner, Will no longer be a fantasy. 



current, People pay more and more attention to dialogue system in various fields, The continuous progress of deep learning technology greatly promotes the development of dialogue system. For dialog system, Deep learning technology can use a lot of data to learn feature representation and recovery generation strategies, Only a small amount of manual operation is needed.





Nowadays, We can easily access conversations on the Internet“ Big data”, We may be able to learn how to respond, And how to reply to almost any input, This will greatly allow us to build data-driven, Open dialogue system.





On the other hand, Deep learning technology has been proved to be effective, Complex patterns can be captured in big data, And has a large number of research fields, Such as computer vision, Natural language processing and recommendation system, etc. In this article, The author summarizes these recent developments in dialogue systems from different perspectives, Some possible research directions are also discussed. 


say concretely, Dialogue system can be roughly divided into two types:




*
Task oriented(task-oriented) Dialogue system and




*
Non task oriented(non-task-oriented) Dialogue system( Also known as chat robot).




The purpose of task oriented system is to help users to complete practical and specific tasks, For example, help users find products, Book hotel restaurant, etc. 


The widely used method of task-oriented system is to treat conversation response as a pipeline(pipeline), As shown in the figure below: 









The system first understands the message that humans convey, As an internal state, Then a series of corresponding actions are taken according to the strategy of dialogue state, Finally, the movement is transformed into the expression of natural language.




Although language understanding is handled through statistical models, But most deployed dialog systems still use manual features or rules, For state and action space representation, Intent detection and slot filling. 



Non task oriented dialogue system and human interaction, Provide reasonable response and entertainment functions, Generally speaking, it focuses on the open field to talk with people. Although the non task oriented system seems to be chatting, But it works in many practical applications.

data display, In the online shopping scene, near80% Our words are chatting information, The way these issues are handled is closely related to the user experience.




generally speaking, For non task oriented dialogue system, At present, there are two main methods:




*

Generation method, For example, sequence to sequence model(seq2seq), Generate appropriate responses during the conversation, Generative chat robot is a hot topic in the research field, Different from the retrieval chat robot, It can generate a new kind of reply, So it's relatively flexible, But it also has its own shortcomings, For example, sometimes there are grammatical errors, Or generate meaningless replies;




*

Retrieval based method, Search from a predefined index, Learn to choose a reply from the current conversation. The disadvantage of retrieval method is that it relies too much on data quality, If the selected data is of poor quality, It's very likely that all the previous achievements will be lost. 









In recent years, The rapid development of big data and deep learning technology, Greatly promoted the development of task oriented and non oriented dialogue systems.





In this paper, The author's goal is




*
Overview of dialogue system, Especially the recent development of deep learning;

*
Discuss possible research directions.




Task oriented system







Task oriented dialog system is an important branch of dialog system. In this section, The author summarizes the pipeline method and end-to-end method of task oriented dialog system.




The Conduit(pipeline) Method




The typical structure of task oriented dialogue system is shown in the previous figure, It consists of four key components: 





*
natural language understanding(Natural Language Understanding,NLU): It parses user input into predefined semantic slots. 




If there is a word, Mapping natural language understanding to semantic slots. Slots are predefined for different scenarios. 









The figure above shows an example of a natural language representation, among“New
York” Is designated asslot Position of value, The domain and intention are specified respectively. Typical, There are two types of representations. One is discourse level category, Such as user's intention and discourse category. The other is word level information extraction, Such as named entity recognition and slot filling. Conversation intention detection is to detect the user's intention. It divides discourse into a predefined intention. 




*
Conversation status tracking(Dialogue State Tracker,DST)
. Dialog state tracking is the core component to ensure the robustness of dialog system. It estimates the user's goals in each round of the conversation, Manage input and conversation history for each round, Output current conversation status. This typical state structure is often called slot filling or semantic framework. Traditional methods have been widely used in most business implementations, Manual rules are usually used to select the most likely output results. however, These rules based systems are prone to frequent errors, Because the most likely outcome is not always ideal. 



The most recent deep learning method is to use a sliding window to output any number of probability distribution sequences of possible values. Although it is trained in one area, But it can easily move to new areas. The most commonly used model here is,multi-domain
RNN dialog state tracking models andNeural Belief Tracker (NBT) . 




*
Dialogue strategy learning(Dialogue policy learning)
. According to the state representation of state tracker, Strategy learning is to generate the next available system operation. Both supervised learning and reinforcement learning can be used to optimize policy learning. Supervised learning is aimed at the behavior produced by rules, In the online shopping scene, If the conversation status is“ Recommend”, Then trigger“ Recommend” operation, The system will retrieve products from the product database. The introduction of reinforcement learning method can further train dialogue strategies, To guide the system to develop the final strategy. In the actual experiment, The effect of reinforcement learning method is better than that based on rules and supervision. 



*
Natural language generation(Natural Language Generation,NLG). It will select actions to map and generate a reply. 



A good generator usually depends on several factors: Appropriateness, Fluency, Readability and variability. ConventionalNLG The method is usually to execute the sentence plan. It maps the input semantic symbols to the intermediary forms representing discourse, Such as tree or template structure, The intermediate structure is then transformed into the final response by surface implementation. The mature method of deep learning is based onLSTM Ofencoder-decoder form, Information about the problem, Semantic slot value and conversation behavior type are combined to generate correct answers. At the same time, attention mechanism is used to deal with the key information of decoder's current decoding state, Generate different responses according to different behavior types. 




End to end(end-to-end) Method





Although the traditional task-oriented dialogue system has many handmade in specific fields, But it's hard for them to adapt to new areas, In recent years, With the development of end-to-end neurogenesis model, An end-to-end trainable framework for task oriented dialogue system is constructed. It should be noted that, When we introduce non task oriented dialog system, More details on the neural generation model will be discussed. Different from the traditional pipeline model, One module for the end-to-end model, And interact with the structured external database. 










The model above is a network-based end-to-end trainable task oriented dialogue system, Taking the learning of dialogue system as the mapping problem of learning from dialogue history to system reply, And Applicationencoder-decoder Model to train. however, The system is trained under supervision—— Not only need a lot of training data, Moreover, due to the lack of further exploration on dialogue control of training data, It may not be able to find a good strategy. 









With the development of reinforcement learning research, The model above first proposes an end-to-end reinforcement learning method, Joint training of dialogue state tracking and dialogue strategy learning in dialogue management, So as to optimize the action of the system more effectively.




Non task oriented system





Different from task oriented dialogue system, Its goal is to accomplish specific tasks for users, Instead of task oriented dialogue system( Also known as chat robot) Focus on talking to people in open areas. generally speaking, Chat robot is realized by generating method or retrieval based method. 





Generating models can generate more appropriate responses, And these responses may never appear in the corpus, The retrieval based model has the advantages of abundant information and smooth response.




1. Neurogenesis model(Neural Generative Models)




Successful application of deep learning in machine translation, Neural machine translation, Arousing people's enthusiasm for the study of neurogenic dialogue. At present, the hot research topics of neural generation model are as follows.





1.1 Sequence-to-Sequence Models




Given inclusion  Input sequence of words(message)









And the length isT Target sequence of(response)









Model maximizationY stayX Conditional probability under: 


 






say concretely,Seq2Seq The model is inencoder-decoder In structure, The figure below shows the structure: 









Encoder willX Verbatim reading, And through recurrent neural network(RNN) Represent it as a context vectorc,  Then the decoder willc As input estimateY Generation probability of. 




Encoder :




Encoder The process is simple, Direct useRNN( General useLSTM) Generate semantic vector: 









amongf  It's a nonlinear function, for exampleLSTM,GRU,




Is the last hidden node output, Is the input of the current time. vectorc Usually forRNN Last hidden node in

(h, Hidden state), Or the weighted sum of multiple hidden nodes. 




Decoder :


Modeldecoder The process is to use anotherRNN Predict the current output symbol by the current hidden state , The sum here is related to its previous hidden state and output,Seq2Seq The objective function of is defined as: 









1.2. Conversation context(Dialogue Context)




Considering the context information of dialogue is the key to building a dialogue system, It can keep the conversation consistent and enhance the user experience. Use hierarchicalRNN Model, Capturing the meaning of individual statements, And then integrate it into a complete conversation.





meanwhile, Expand the hierarchical structure with attention methods at word level and sentence level respectively.




Test proof:




*
Stratification RNNs Is usually better than non hierarchicalRNNs;




*
After considering context sensitive information, Neural networks tend to produce longer, More meaningful and diverse responses. 









In the above picture, By representing the whole history of dialogue( Include current information), Using continuous representation or embedding words and phrases to solve the problem of context sensitive reply generation. 









In the structure of the figure above, the author introduces two levels ofAttention mechanism, Let the model automatically learn the importance information of words and sentences, So as to better generate a new round of dialogue.




In sentence level information, It's reverse learning, That is to say, in the message of the next sentence更能够包含上一句的信息,所以从总体上来看,其对于对话的学习是逆向使用每一轮对话的内容的.




1.3 回复多样性(Response Diversity) 


在当前Seq2Seq对话系统中,一个具有挑战性的问题是,它们倾向于产生无关紧要的或不明确的,普通的,几乎没有意义的回复,而这些回复常常涉及到一些诸如“I
don't know”,“I am OK”这样的无意义回复.




解决这类问题的一个很有效的方法是找到并设置一个更好的目标函数.除此之外,解决此类问题的一个方法是增加模型的复杂度.下图这篇论文《Building
End-To-End Dialogue Systems 


Using Generative Hierarchical Neural Network Models》使用了 latent variable
来解决无意义回复这个问题. 









1.4 主题和个性化(Topic and Personality) 


明确对话的内在属性是提高对话多样性和保证一致性的另一种方法.在不同的属性中,主题和个性被广泛地进行研究探讨. 


在下图的模型中,作者注意到人们经常把他们的对话与主题相关的概念联系起来,并根据这些概念做出他们的回复.他们使用Twitter
LDA模型来获取输入的主题,将主题信息和输入表示输入到一个联合注意模块中,并生成与主题相关的响应. 









下图的模型提出了一种两阶段的训练方法,使用大规模数据对模型进行初始化,然后对模型进行微调,生成个性化响应. 









1.5 外部知识库(Outside Knowledge Base) 


人类对话与对话系统之间的一个重要区别是它是否与现实相结合.结合外部知识库(KB)是一种很有前途的方法,可以弥补背景知识之间的差距,即对话系统和人之间的差距.




记忆网络(Memory
Network)是一种以知识库处理问题的经典方法.因此,它非常直接的别用于在对话生成中.实际研究表明,所提出的模型能够通过参考知识库中的事实来生成对问题的自然和正确答案.









上图是作者提出的完全数据驱动的带有知识的对话模型.其中的 World Facts是一个集合,收集一些经过权威认证过的句子或者不准确的句子,作为知识库.




当个定一个输入S和历史,需要在 Fact 集合里面检索相关的facts,这里采用的IR引擎进行检索,然后经过 Fact Encoder 进行 fact
injection.





上图的模型提出了一种全新的,完全由数据驱动的,基于知识的神经对话模型,目的是在没有槽位的情况下产生更多的内容.作者归纳了广泛使用的SEQ2SEQ方法,通过对会话历史和外部“事实”的响应 




1.6 评价 


评价生成回复的质量是对话系统的一个重要方面.任务导向型的对话系统可以基于人工生成的监督信号进行评估,例如任务完成测试或用户满意度评分等,




然而,由于高回复的多样性,自动评估非任务导向的对话系统所产生的响应的质量仍然是一个悬而未决的问题.目前的方法有以下几种: 




*
 计算 BLEU 值,也就是直接计算 word overlap,ground
truth和你生成的回复.由于一句话可能存在多种回复,因此从某些方面来看,BLEU 可能不太适用于对话评测. 







*
计算 embedding的距离,这类方法分三种情况:直接相加求平均,先取绝对值再求平均和贪婪匹配. 







*
衡量多样性,主要取决于 distinct-ngram 的数量和 entropy 值的大小.




*
进行图灵测试,用 retrieval 的 discriminator 来评价回复生成. 




2. 基于检索的方法    




基于检索的方法从候选回复中选择回复.检索方法的关键是消息-回复匹配,匹配算法必须克服消息和回复之间的语义鸿沟. 



2.1 单轮回复匹配 


检索聊天机器人的早期研究主要集中在反应选择单轮的谈话,只有消息用于选择一个合适的回复. 


目前比较新的方法如下图,利用深度卷积神经网络体系结构改进模型,学习消息和响应的表示,或直接学习两个句子的相互作用表示,然后用多层感知器来计算匹配的分数. 









2.2 多轮回复匹配 


近年来,基于检索的多轮会话越来越受到人们的关注,在多轮回答选择中,将当前的消息和先前的话语作为输入.




模型选择一个自然的,与整个上下文相关的响应.重要的是要在之前的话语中找出重要的信息,并恰当地模仿话语的关系,以确保谈话的连贯性. 


多轮对话的难点在于不仅要考虑当前的问题,也要考虑前几轮的对话情景.多轮对话的难点主要有两点:




*
如何明确上下文的关键信息(关键词,关键短语或关键句);




*
在上下文中如何模拟多轮对话间的关系.  




现有检索模型的缺陷:在上下文中容易丢失重要信息,因为它们首先将整个上下文表示为向量,然后将该上下文向量与响应sentence向量进行匹配. 




下图的方法通过RNN/LSTM的结构将上下文(所有之前的话语和当前消息的连接)和候选响应分别编码到上下文向量和回复向量中,然后根据这两个向量计算出匹配度分数.










目前关于检索模型的闲聊还停留在单轮对话中,下面这篇论文提出了基于检索的多轮对话闲聊.论文提出了一个基于检索的多轮闲聊架构,进一步改进了话语关系和上下文信息的利用,通过将上下文中的语句与卷积神经网络的不同层级进行匹配,然后通过一个递归的神经网络在时间序列中堆积这些向量,以建立对话之间的关系.









2.3 混合的方法(Hybrid Methods)




将生成和检索方法结合起来能对系统性能起到显著的提升作用.基于检索的系统通常给出精确但是较为生硬的答案,而基于生成的系统则倾向于给出流畅但却是毫无意义的回答.




将生成和检索方法结合起来能对系统性能起到显著的提升作用.基于检索的系统通常给出精确但是较为生硬的答案,而基于生成的系统则倾向于给出流畅但却是毫无意义的回答.





在集成模型中,被抽取的候选对象和原始消息一起被输入到基于RNN的回复生成器中.这种方法结合了检索和生成模型的优点,这在性能上具备很大的优势.




未来的发展





深度学习已成为对话系统的一项基本技术.研究人员将神经网络应用于传统任务导向型对话系统的不同组成部分,包括自然语言理解,自然语言生成,对话状态跟踪.近年来,端到端的框架不仅在非面向任务的聊天对话系统中流行,而且在面向任务的对话系统中逐步流行起来.





深度学习能够利用大量的数据,从而模糊了任务导向型对话系统和非任务导向型对话系统之间的界限.值得注意的是,目前的端到端模型仍然远非完美.尽管取得了上述成就,但这些问题仍然具有挑战性.接下来,我们将讨论一些可能的研究方向.





快速适应.虽然端到端模型越来越引起研究者的重视,我们仍然需要在实际工程中依靠传统的管道(pipeline)方法,特别是在一些新的领域,特定领域对话数据的收集和对话系统的构建是比较困难的.未来的趋势是对话模型有能力从与人的交互中主动去学习.





深度理解.现阶段基于神经网络的对话系统极大地依赖于大量标注好的数据,结构化的知识库以及对话语料数据.在某种意义上产生的回复仍然缺乏多样性,有时并没有太多的意义,因此对话系统必须能够更加有效地深度理解语言和真实世界.


隐私保护.目前广泛应用的对话系统服务于越来越多的人.很有必要注意到的事实是我们使用的是同一个对话助手.通过互动,理解和推理的学习能力,对话助手可以无意中隐蔽地存储一些较为敏感的信息.因此,在构建更好的对话机制时,保护用户的隐私是非常重要的.