TY - GEN
T1 - NLyze
T2 - 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014
AU - Gulwani, Sumit
AU - Marron, Mark
PY - 2014
Y1 - 2014
N2 - Millions of computer end users need to perform tasks over tabular spreadsheet data, yet lack the programming knowledge to do such tasks automatically. This paper describes the design and implementation of a robust natural language based interface to spreadsheet programming. Our methodology involves designing a typed domain-specific language (DSL) that supports an expressive algebra of map, filter, reduce, join, and formatting capabilities at a level of abstraction appropriate for non-expert users. The key algorithmic component of our methodology is a translation algorithm for converting a natural language specification in the context of a given spreadsheet to a ranked set of likely programs in the DSL. The translation algorithm leverages the spreadsheet spatial and temporal context to assign interpretations to specifications with implicit references, and is thus robust to a variety of ways in which end users can express the same task. The translation algorithm builds over ideas from keyword programming and semantic parsing to achieve both high precision and high recall. We implemented the system as an Excel add-in called NLyze that supports a rich user interaction model including annotating the user's natural language specification and explaining the synthesized DSL programs by paraphrasing them into structured English. We collected a total of 3570 English descriptions for 40 spreadsheet tasks and our system was able to generate the intended interpretation as the top candidate for 94% (97% for the top 3) of those instances.
AB - Millions of computer end users need to perform tasks over tabular spreadsheet data, yet lack the programming knowledge to do such tasks automatically. This paper describes the design and implementation of a robust natural language based interface to spreadsheet programming. Our methodology involves designing a typed domain-specific language (DSL) that supports an expressive algebra of map, filter, reduce, join, and formatting capabilities at a level of abstraction appropriate for non-expert users. The key algorithmic component of our methodology is a translation algorithm for converting a natural language specification in the context of a given spreadsheet to a ranked set of likely programs in the DSL. The translation algorithm leverages the spreadsheet spatial and temporal context to assign interpretations to specifications with implicit references, and is thus robust to a variety of ways in which end users can express the same task. The translation algorithm builds over ideas from keyword programming and semantic parsing to achieve both high precision and high recall. We implemented the system as an Excel add-in called NLyze that supports a rich user interaction model including annotating the user's natural language specification and explaining the synthesized DSL programs by paraphrasing them into structured English. We collected a total of 3570 English descriptions for 40 spreadsheet tasks and our system was able to generate the intended interpretation as the top candidate for 94% (97% for the top 3) of those instances.
KW - End-user programming
KW - Program synthesis
KW - Spreadsheet programming
KW - User intent
UR - http://www.scopus.com/inward/record.url?scp=84904304215&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84904304215&partnerID=8YFLogxK
U2 - 10.1145/2588555.2612177
DO - 10.1145/2588555.2612177
M3 - Conference contribution
AN - SCOPUS:84904304215
SN - 9781450323765
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 803
EP - 814
BT - SIGMOD 2014 - Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
Y2 - 22 June 2014 through 27 June 2014
ER -