JEDI: These aren't the JSON documents you're looking for...
Autoři
Hütter, T.; Augsten, N.; Kirsch, C.; Carey, M.J.; Li, C.
Rok
2022
Publikováno
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data. New York: Association for Computing Machinery, 2022. p. 1584-1597. ISSN 0730-8078. ISBN 978-1-4503-9249-5.
Typ
Stať ve sborníku
Pracoviště
Anotace
The JavaScript Object Notation (JSON) is a popular data format used in document stores to natively support semi-structured data. In this paper, we address the problem of JSON similarity lookup queries: given a query document and a distance threshold 𝜏, retrieve
all documents that are within 𝜏 from the query document. Different from other hierarchical formats such as XML, JSON supports both ordered and unordered sibling collections within a single document which poses a new challenge to the tree model and distance computation. We propose JSON tree, a lossless tree representation of JSON documents, and define the JSON Edit Distance (JEDI), the first edit-based distance measure for JSON. We develop QuickJEDI, an algorithm that computes JEDI by leveraging a new technique to prune expensive sibling matchings. It outperforms a baseline algorithm by an order of magnitude in runtime. To boost the performance of JSON similarity queries, we introduce an index called JSIM and an effective upper bound based on tree sorting. Our upper bound algorithm runs in 𝑂 (𝑛𝜏) time and 𝑂 (𝑛 +𝜏 log 𝑛) space, which substantially improves the previous best bound of 𝑂 (𝑛2) time and
𝑂 (𝑛 log 𝑛) space (where 𝑛 is the tree size). Our experimental evaluation shows that our solution scales to databases with millions of documents and JSON trees with tens of thousands of nodes.
ASE: A Value Set Decision Procedure for Symbolic Execution
Autoři
Abyaneh, A.S.; Kirsch, C.
Rok
2021
Publikováno
Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 2021. p. 203-214. ISBN 978-1-6654-0337-5.
Typ
Stať ve sborníku
Anotace
A symbolic execution engine regularly queries a Satisfiability Modulo Theory (SMT) solver to determine reachability of code during execution. Unfortunately, the SMT solver is often the bottleneck of symbolic execution. Inspired by abstract interpretation, we propose an abstract symbolic execution (ASE) engine which aims at querying the SMT solver less often by trying to compute reachability faster through an increasingly weaker abstraction. For this purpose, we have designed and implemented a value set decision procedure based on strided value interval (SVI) sets for efficiently determining precise, or under-approximating value sets for variables. Our ASE engine begins reasoning with respect to the SVI abstraction, and then only if needed uses the theory of bit-vectors implemented in SMT solvers. Our ASE engine efficiently detects when the former abstraction becomes incomplete to move on and try the next abstraction.
We have designed and implemented a prototype of our engine for a subset of 64-bit RISC-V. Our experimental evaluation shows that our prototype often improves symbolic execution time by significantly reducing the number of SMT queries while, whenever the abstraction does not work, the overhead for trying still remains low.
What we eval in the shadows: A large-scale study of eval in R programs
Autoři
Rok
2021
Publikováno
Proceedings of the ACM on Programming Languages (PACMPL). 2021, 5(OOPSLA), ISSN 2475-1421.
Typ
Článek
DOI
Pracoviště
Anotace
Most dynamic languages allow users to turn text into code using various functions, often named eval, with language-dependent semantics. The widespread use of these reflective functions hinders static analysis and prevents compilers from performing optimizations. This paper aims to provide a better sense of why programmers use eval. Understanding why eval is used in practice is key to finding ways to mitigate its negative impact. We have reasons to believe that reflective feature usage is language and application domain-specific; we focus on data science code written in R and compare our results to previous work that analyzed web programming in JavaScript. We analyze 49,296,059 calls to eval from 240,327 scripts extracted from 15,401 R packages. We find that eval is indeed in widespread use; R's eval is more pervasive and arguably dangerous than what was previously reported for JavaScript.