Keyphrase provides highly-summative information that can be effectively used for understanding, organizing and retrieving text content. Though previous studies have provided many workable solutions for automated keyphrase extraction, they commonly divided the to-be-summarized content into multiple text chunks, then ranked and selected the most meaningful ones. These approaches could neither identify keyphrases that do not appear in the text, nor capture the real semantic meaning behind the text. We propose a generative model for keyphrase prediction with an encoder-decoder framework, which can effectively overcome the above drawbacks. We name it as deep keyphrase generation since it attempts to capture the deep semantic meaning of the content with a deep learning method. Empirical analysis on six datasets demonstrates that our proposed model not only achieves a significant performance boost on extracting keyphrases that appear in the source text, but also can generate absent keyphrases based on the semantic meaning of the text. Code and dataset are available at https://github.com/memray/seq2seqkeyphrase.
|Title of host publication||ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)|
|Number of pages||11|
|State||Published - 2017|
|Event||55th Annual Meeting of the Association for Computational Linguistics, ACL 2017 - Vancouver, Canada|
Duration: Jul 30 2017 → Aug 4 2017
|Name||ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)|
|Conference||55th Annual Meeting of the Association for Computational Linguistics, ACL 2017|
|Period||7/30/17 → 8/4/17|
Bibliographical noteFunding Information:
We would like to thank Jiatao Gu and Miltiadis Allamanis for sharing the source code and giving helpful advice. We also thank Wei Lu, Yong Huang, Qikai Cheng and other IRLAB members at Wuhan University for the assistance of dataset development. This work is partially supported by the National Science Foundation under Grant No.1525186.
© 2017 Association for Computational Linguistics.
ASJC Scopus subject areas
- Language and Linguistics
- Artificial Intelligence
- Linguistics and Language