NLyze: Interactive programming by natural language for spreadsheet data analysis and manipulation

Sumit Gulwani, Mark Marron

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

90 Scopus citations

Abstract

Millions of computer end users need to perform tasks over tabular spreadsheet data, yet lack the programming knowledge to do such tasks automatically. This paper describes the design and implementation of a robust natural language based interface to spreadsheet programming. Our methodology involves designing a typed domain-specific language (DSL) that supports an expressive algebra of map, filter, reduce, join, and formatting capabilities at a level of abstraction appropriate for non-expert users. The key algorithmic component of our methodology is a translation algorithm for converting a natural language specification in the context of a given spreadsheet to a ranked set of likely programs in the DSL. The translation algorithm leverages the spreadsheet spatial and temporal context to assign interpretations to specifications with implicit references, and is thus robust to a variety of ways in which end users can express the same task. The translation algorithm builds over ideas from keyword programming and semantic parsing to achieve both high precision and high recall. We implemented the system as an Excel add-in called NLyze that supports a rich user interaction model including annotating the user's natural language specification and explaining the synthesized DSL programs by paraphrasing them into structured English. We collected a total of 3570 English descriptions for 40 spreadsheet tasks and our system was able to generate the intended interpretation as the top candidate for 94% (97% for the top 3) of those instances.

Original languageEnglish
Title of host publicationSIGMOD 2014 - Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
Pages803-814
Number of pages12
DOIs
StatePublished - 2014
Event2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014 - Snowbird, UT, United States
Duration: Jun 22 2014Jun 27 2014

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 2014
Country/TerritoryUnited States
CitySnowbird, UT
Period6/22/146/27/14

Keywords

  • End-user programming
  • Program synthesis
  • Spreadsheet programming
  • User intent

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'NLyze: Interactive programming by natural language for spreadsheet data analysis and manipulation'. Together they form a unique fingerprint.

Cite this