深圳大学新葡的京集团350vip8888
College of Computer Science and Software Engineering, SZU

 Efficient Dialog Policy Learning by Reasoning with Contextual Knowledge

AAAI Conference on Artificial Intelligence (AAAI)

 

Haodi Zhang1    Zhichao Zeng1    Keting Lu2    Kaishun Wu1*    Shiqi Zhang3

1Shenzhen University    2Baidu, Inc    3SUNY Binghamton

 

Abstract

Goal-oriented dialog policy learning algorithms aim to learn a dialog policy for selecting language actions based on the current dialog state. Deep reinforcement learning methods have been used for dialog policy learning. This work is motivated by the observation that, although dialog is a domain with rich contextual knowledge, reinforcement learning methods are ill-equipped to incorporate such knowledge into the dialog policy learning process. In this paper, we develop a deep reinforcement learning framework for goal-oriented dialog policy learning that learns user preferences from user goal data, while leveraging commonsense knowledge from people. The developed framework has been evaluated using a realistic dialog simulation platform. Compared with baselines from the literature and the ablations of our approach, we see significant improvements in learning efficiency and the quality of the computed action policies. (The code is available here: https://github.com/ResearchGroupHdZhang/DPL_AAAI22)

 

Figure 1: Overview of the framework. The framework mainly consists of three components, namely, internal knowledge (MLNbased), external knowledge (ASP-based) and dialog policy learning (DRL-based). The MLN-based probabilistic reasoner collects facts about internal factors, and provides user preferences to DRL for system action generation. The ASP-based logical reasoner collects facts about external factors, and reasons for potentially better service when the dialog terminates. Solid lines indicate the data flow, and dashed lines represent the function calls after a dialog terminates. 

 

Figure 2: Performance comparison. The subgraphs (a)-(d) show the results of our approach deployed on different popular dialog policy algorithms (A2C, DQN, ACER and BBQN). Each subgraph shows our approach (DRL+MLN+ASP), its ablation (DRL+MLN), and the standard DRL-based dialog agent. The subfigure (e) shows policy learning with different sizes of training data for MLN. The subfigure (f) compares different levels of external knowledge. Both subgraph (e) and (f) use RL algorithm A2C.

Figure 3: A sample dialog. The left part shows a successful dialog in abstract form, and the corresponding dialog in natural language are in the right part. The middle part illustrates how contextual knowledge is used to help the agent predict user preferences according to current observation.

 

Acknowledgements

The work is partially supported by the National Natural Science Foundation of China (61806132, U2001207, 61872248), Guangdong NSF 2017A030312008, Shenzhen STF (ZDSYS20190902092853047, R2020A045), the Project of DEGP (2019KCXTD005, 2021ZDZX1068), the Guangdong “Pearl River Talent Recruitment Program” (2019ZT08X603). A portion of this research has taken place at the Autonomous Intelligent Robotics (AIR) Group, SUNY Binghamton. AIR research is supported in part by grants from the National Science Foundation (NRI- 1925044), Ford Motor Company (URP Awards 2019-2021), OPPO (Faculty Research Award 2020), and SUNY Research Foundation. Kaishun Wu is the corresponding author.

 

Bibtex

@inproceedings{DBLP:conf/aaai/Zhang,

author = {Haodi Zhang and

Zhichao Zeng and

Keting Lu and

Kaishun Wu and

Shiqi Zhang},

title = {Efficient Dialog Policy Learning by Reasoning with Contextual Knowledge},

booktitle = {Proceedings of The Thirty-Sixth AAAI Conference on Artificial Intelligence},

year = {2022},

}

Downloads

XML 地图