TY - JOUR
T1 - Mining semantic loop idioms
AU - Allamanis, Miltiadis
AU - Barr, Earl T.
AU - Bird, Christian
AU - Devanbu, Premkumar
AU - Marron, Mark
AU - Sutton, Charles
N1 - Publisher Copyright:
© 1976-2012 IEEE.
PY - 2018/7/1
Y1 - 2018/7/1
N2 - To write code, developers stitch together patterns, like API protocols or data structure traversals. Discovering these patterns can identify inconsistencies in code or opportunities to replace these patterns with an API or a language construct. We present coiling, a technique for automatically mining code for semantic idioms: surprisingly probable, semantic patterns. We specialize coiling for loop idioms, semantic idioms of loops. First, we show that automatically identifiable patterns exist, in great numbers, with a large-scale empirical study of loops over 25MLOC. We find that most loops in this corpus are simple and predictable: 90 percent have fewer than 15LOC and 90 percent have no nesting and very simple control. Encouraged by this result, we then mine loop idioms over a second, buildable corpus. Over this corpus, we show that only 50 loop idioms cover 50 percent of the concrete loops. Our framework opens the door to data-driven tool and language design, discovering opportunities to introduce new API calls and language constructs. Loop idioms show that LINQ would benefit from an Enumerate operator. This can be confirmed by the exitence of a StackOverflow question with 542k views that requests precisely this feature.
AB - To write code, developers stitch together patterns, like API protocols or data structure traversals. Discovering these patterns can identify inconsistencies in code or opportunities to replace these patterns with an API or a language construct. We present coiling, a technique for automatically mining code for semantic idioms: surprisingly probable, semantic patterns. We specialize coiling for loop idioms, semantic idioms of loops. First, we show that automatically identifiable patterns exist, in great numbers, with a large-scale empirical study of loops over 25MLOC. We find that most loops in this corpus are simple and predictable: 90 percent have fewer than 15LOC and 90 percent have no nesting and very simple control. Encouraged by this result, we then mine loop idioms over a second, buildable corpus. Over this corpus, we show that only 50 loop idioms cover 50 percent of the concrete loops. Our framework opens the door to data-driven tool and language design, discovering opportunities to introduce new API calls and language constructs. Loop idioms show that LINQ would benefit from an Enumerate operator. This can be confirmed by the exitence of a StackOverflow question with 542k views that requests precisely this feature.
KW - code patterns
KW - Data-driven tool design
KW - idiom mining
UR - http://www.scopus.com/inward/record.url?scp=85046480947&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85046480947&partnerID=8YFLogxK
U2 - 10.1109/TSE.2018.2832048
DO - 10.1109/TSE.2018.2832048
M3 - Article
AN - SCOPUS:85046480947
SN - 0098-5589
VL - 44
SP - 651
EP - 668
JO - IEEE Transactions on Software Engineering
JF - IEEE Transactions on Software Engineering
IS - 7
ER -