In the Programming Research Laboratory (PRL@PRG), we are interested in a deeper understanding of programming languages and their use. The laboratory is led by Prof. Christoph Kirsch of the University of Salzburg, Austria.
The lab’s activities are focused on these topics
Projects
Rigorous Engineering of Data Analysis Pipelines (RiGiD)
Program
Grantové projekty excelence v základním výzkumu EXPRO
Provider
Czech Science Foundation
Investigators
Code
GX23-07580X
Period
2023 - 2027
Description
The RiGiD project lays the groundwork for this research programme and aims to develop a methodology for rigorous engineering of data analysis pipelines that can be adopted in practice. Our approach is pragmatic. Rather than chasing functional correctness, we hope to substantially reduce the incidence of errors in the wild. The research is structured in three overlapping chapters. First, identify the problem by carrying out user studies and large-scale program analysis of a corpus of over 100,000 data science pipelines. The outcome will be a catalog of error patterns as well as a labeled dataset to be shared with other researchers. The technical advances will focus on combining dynamic and static program analysis to approximate the behavior of partial programs and programs written in highly dynamic languages. The second part of our effort proposes a methodology and tooling for developing data sciences codes with reduced error rates. The technical contributions of this part of the project focus on lightweight specification techniques and, in particular, the development of a novel gradual typing system that deals with common programming idioms found in our corpus. This includes various forms of object orientation, data frames, and rich value specifications. These specifications are complemented with an automated test generation technique that combines test and input synthesis with fuzzing and test minimization. Finally, the execution environment is extended to support automatic reproducibility and result audits through data lineage. The third and last part of the work evaluates the proposal by conducting user studies and developing tools for automating deployment. The contribution will be a qualitative and quantitative assessment of the RiGiD methodology and tooling. The technical contribution will be tools that leverage program analysis to infer approximate specifications to assist deployment and adoption. Our tools target R, a language for data analytics with 2 milli
Evolving Language Ecosystems
Program
Horizon 2020
Provider
European Commission
Investigators
Code
695412
Period
2016 - 2022
Description
The Evolving Language Ecosystems project explores the fundamental techniques and algorithms for evolving programming languages and their ecosystems. Our purpose is to reduce the cost of wide-ranging language changes and obviate the need for devising entirely new languages. Our findings will grant both researchers and practitioners a greater degree of freedom when experimenting with new ideas on how to express computation.
Big Code: Scalable Analysis of Massive Code Bases
Program
Operational Programme – Research, Development and Education – Structural Funds EU
Provider
European Commission
Investigators
Code
EF15_003/0000421, CZ.02.1.01/0.0/0.0/15_003/0000421
Period
2019 - 2022
Description
The project aims to create at the FIT CTU the Institute of Scalable Code Analytics (ISCA), the first research centre in the CR focused on analyses of large code bases available on the Internet. Software systems are written in source code; BigCode refers to the massive codebases on the Internet. Combining techniques from programming languages and statistical machine learning will allow the mining of these codebases for crucial insights. The requested funds will be invested, in part, to provide the FIT with the first hardware and software infrastructure for big code data analysis. The other part of the research funding will attract internationally renowned researchers in the field of computer languages. The team is synergistic with existing research capacities at the FIT in software and knowledge engineering, data mining and parallel computing. The new team is well connected internationally and will bring investment from leading industrial partners that include Google and Oracle.
Publications
Deoptless: Speculation with Dispatched On-Stack Replacement and Specialized Continuations
Authors
Flückiger, O.; Ječmen, J.; Krynski, S.; Vitek, J.
Year
2022
Published
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation. New York: Association for Computing Machinery, 2022. p. 749-761. ISBN 978-1-4503-9265-5.
Type
Proceedings paper
Annotation
Just-in-time compilation provides significant performance improvements for programs written in dynamic languages. These benefits come from the ability of the compiler to speculate about likely cases and generate optimized code for these. Unavoidably, speculations sometimes fail and the optimizations must be reverted. In some pathological cases, this can leave the program stuck with suboptimal code. In this paper we propose deoptless, a technique that replaces deoptimization points with dispatched specialized continuations. The goal of deoptless is to take a step towards providing users with a more transparent performance model in which mysterious slowdowns are less frequent and grave.
Type stability in Julia: Avoiding performance pathologies in JIT compilation
Authors
Pelenitsyn, A.; Belyakova, J.; Chung, B.; Tate, R.; Vitek, J.
Year
2021
Published
Proceedings of the ACM on Programming Languages (PACMPL). 2021, 5 1-26. ISSN 2475-1421.
Type
Article
DOI
Departments
Annotation
As a scientific programming language, Julia strives for performance but also provides high-level productivity features. To avoid performance pathologies, Julia users are expected to adhere to a coding discipline that enables so-called type stability. Informally, a function is type stable if the type of the output depends only on the types of the inputs, not their values. This paper provides a formal definition of type stability as well as a stronger property of type groundedness, shows that groundedness enables compiler optimizations, and proves the compiler correct. We also perform a corpus analysis to uncover how these type-related properties manifest in practice.
Promises Are Made To Be Broken: Migrating R to Strict Semantics
Authors
Goel, A.; Ječmen, J.; Krynski, S.; Flückiger, O.; Vitek, J.
Year
2021
Published
Proceedings of the ACM on Programming Languages (PACMPL). 2021, 5(OOPSLA), 1-20. ISSN 2475-1421.
Type
Article
DOI
Annotation
Function calls in the R language do not evaluate their arguments, these are passed to the callee as suspended computations and evaluated if needed. After 25 years of experience with the language, there are very few cases where programmers leverage delayed evaluation intentionally and laziness comes at a price in performance and complexity. This paper explores how to evolve the semantics of a lazy language towards strictness-by-default and laziness-on-demand. To provide a migration path, it is necessary to provide tooling for developers to migrate libraries without introducing errors. This paper reports on a dynamic analysis that infers strictness signatures for functions to capture both intentional and accidental laziness. Over 99% of the inferred signatures were correct when tested against clients of the libraries.
What we eval in the shadows: A large-scale study of eval in R programs
Authors
Year
2021
Published
Proceedings of the ACM on Programming Languages (PACMPL). 2021, 5(OOPSLA), ISSN 2475-1421.
Type
Article
DOI
Departments
Annotation
Most dynamic languages allow users to turn text into code using various functions, often named eval, with language-dependent semantics. The widespread use of these reflective functions hinders static analysis and prevents compilers from performing optimizations. This paper aims to provide a better sense of why programmers use eval. Understanding why eval is used in practice is key to finding ways to mitigate its negative impact. We have reasons to believe that reflective feature usage is language and application domain-specific; we focus on data science code written in R and compare our results to previous work that analyzed web programming in JavaScript. We analyze 49,296,059 calls to eval from 240,327 scripts extracted from 15,401 R packages. We find that eval is indeed in widespread use; R's eval is more pervasive and arguably dangerous than what was previously reported for JavaScript.
First-class environments in R
Authors
Goel, A.; Vitek, J.
Year
2021
Published
DLS 2021: Proceedings of the 17th ACM SIGPLAN International Symposium on Dynamic Languages. New York: ACM, 2021. p. 12-22. ISBN 9781450391054.
Type
Proceedings paper
Departments
Annotation
The R programming language is widely used for statistical computing. To enable interactive data exploration and rapid prototyping, R encourages a dynamic programming style. This programming style is supported by features such as first-class environments. Amongst widely used languages, R has the richest interface for programmatically manipulating environments. With the flexibility afforded by reflective operations on first-class environments, come significant challenges for reasoning and optimizing user-defined code. This paper documents the reflective interface used to operate over first-class environment. We explain the rationale behind its design and conduct a large-scale study of how the interface is used in popular libraries. © 2021 Owner/Author.
Formally verified speculation and deoptimization in a JIT compiler
Authors
Barrière, A.; Blazy, S.; Flückiger, O.; Pichardie, D.; Vitek, J.
Year
2021
Published
Proceedings of the ACM on Programming Languages (PACMPL). 2021, 5(POPL), 1-26. ISSN 2475-1421.
Type
Article
DOI
Departments
Annotation
Just-in-time compilers for dynamic languages routinely generate code under assumptions that may be invalidated at run-time, this allows for specialization of program code to the common case in order to avoid unnecessary overheads due to uncommon cases. This form of software speculation requires support for deoptimization when some of the assumptions fail to hold. This paper presents a model just-in-time compiler with an intermediate representation that explicits the synchronization points used for deoptimization and the assumptions made by the compiler's speculation. We also present several common compiler optimizations that can leverage speculation to generate improved code. The optimizations are proved correct with the help of a proof assistant. While our work stops short of proving native code generation, we demonstrate how one could use the verified optimization to obtain significant speed ups in an end-to-end setting. © 2021 Owner/Author.
Contextual Dispatch for Function Specialization
Authors
Flückiger, O.; Chari, G.; Yee, M.-H.; Ječmen, J.; Vitek, J.
Year
2020
Published
Proceedings of the ACM on Programming Languages (PACMPL). 2020, 4(OOPSLA), 1-24. ISSN 2475-1421.
Type
Article
DOI
Annotation
In order to generate efficient code, dynamic language compilers often need information, such as dynamic types, not readily available in the program source. Leveraging a mixture of static and dynamic information, these compilers speculate on the missing information. Within one compilation unit, they specialize the generated code to the previously observed behaviors, betting that past is prologue. When speculation fails, the execution must jump back to unoptimized code. In this paper, we propose an approach to further the specialization, by disentangling classes of behaviors into separate optimization units. With contextual dispatch, functions are versioned and each version is compiled under different assumptions. When a function is invoked, the implementation dispatches to a version optimized under assumptions matching the dynamic context of the call. As a proof-of-concept, we describe a compiler for the R language which uses this approach. Our implementation is, on average, 1.7× faster than the GNU R reference implementation. We evaluate contextual dispatch on a set of benchmarks and measure additional speedup, on top of traditional speculation with deoptimization techniques. In this setting contextual dispatch improves the performance of 18 out of 46 programs in our benchmark suite.
Sampling optimized code for type feedback
Authors
Flückiger, O.; Wälchli, A.; Krynski, S.; Vitek, J.
Year
2020
Published
DSL_Proceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages. New York: ACM, 2020. p. 99-111. ISBN 978-1-4503-8175-8.
Type
Proceedings paper
Departments
Annotation
To efficiently execute dynamically typed languages, many language implementations have adopted a two-tier architecture. The first tier aims for low-latency startup times and collects dynamic profiles, such as the dynamic types of variables. The second tier provides high-throughput using an optimizing compiler that specializes code to the recorded type information. If the program behavior changes to the point that not previously seen types occur in specialized code, that specialized code becomes invalid, it is deoptimized, and control is transferred back to the first tier execution engine which will start specializing anew. However, if the program behavior becomes more specific, for instance, if a polymorphic variable becomes monomorphic, nothing changes. Once the program is running optimized code, there are no means to notice that an opportunity for optimization has been missed. We propose to employ a sampling-based profiler to monitor native code without any instrumentation. The absence of instrumentation means that when the profiler is not active, no overhead is incurred. We present an implementation is in the context of the A just-in-time, optimizing compiler for the R language. Based on the sampled profiles, we are able to detect when the native code produced by A is specialized for stale type feedback and recompile it to more type-specific code. We show that sampling adds an overhead of less than 3 © 2020 ACM.
World age in Julia: Optimizing method dispatch in the presence of eval
Authors
Belyakova, J.; Chung, B.; Gelinas, J.; Nash, J.; Vitek, J.
Year
2020
Published
Proceedings of the ACM on Programming Languages (PACMPL). 2020, 4(OOPSLA), 1-26. ISSN 2475-1421.
Type
Article
DOI
Departments
Annotation
Dynamic programming languages face semantic and performance challenges in the presence of features, such as eval, that can inject new code into a running program. The Julia programming language introduces the novel concept of world age to insulate optimized code from one of the most disruptive side-effects of eval: changes to the definition of an existing function. This paper provides the first formal semantics of world age in a core calculus named juliette, and shows how world age enables compiler optimizations, such as inlining, in the presence of eval. While Julia also provides programmers with the means to bypass world age, we found that this mechanism is not used extensively: a static analysis of over 4,000 registered Julia packages shows that only 4-9% of packages bypass world age. This suggests that Julia's semantics aligns with programmer expectations. © 2020 Owner/Author.
Designing types for R, empirically
Authors
Turcotte, A.; Goel, A.; Křikava, F.; Vitek, J.
Year
2020
Published
Proceedings of the ACM on Programming Languages (PACMPL). 2020, 4(OOPSLA), 1-25. ISSN 2475-1421.
Type
Article
DOI
Departments
Annotation
The R programming language is widely used in a variety of domains. It was designed to favor an interactive style of programming with minimal syntactic and conceptual overhead. This design is well suited to data analysis, but a bad fit for tools such as compilers or program analyzers. In particular, R has no type annotations, and all operations are dynamically checked at run-time. The starting point for our work are the two questions: what expressive power is needed to accurately type R code? and which type system is the R community willing to adopt? Both questions are difficult to answer without actually experimenting with a type system. The goal of this paper is to provide data that can feed into that design process. To this end, we perform a large corpus analysis to gain insights in the degree of polymorphism exhibited by idiomatic R code and explore potential benefits that the R community could accrue from a simple type system. As a starting point, we infer type signatures for 25,215 functions from 412 packages among the most widely used open source R libraries. We then conduct an evaluation on 8,694 clients of these packages, as well as on end-user code from the Kaggle data science competition website.
On the Design, Implementation, and Use of Laziness in R
Authors
Vitek, J.; Goel, A.
Year
2019
Published
Volume 3, Issue OOPSLA. New York: ACM, 2019. p. 1-27. ISSN 2475-1421.
Type
Proceedings paper
DOI
Departments
Annotation
The R programming language has been lazy for over twenty-five years. This paper presents a review of the design and implementation of call-by-need in R, and a data-driven study of how generations of programmers have put laziness to use in their code. We analyze 16,707 packages and observe the creation of 270.9 B promises. Our data suggests that there is little supporting evidence to assert that programmers use laziness to avoid unnecessary computation or to operate over infinite data structures. For the most part R code appears to have been written without reliance on, and in many cases even knowledge of, delayed argument evaluation. The only significant exception is a small number of packages which leverage call-by-need for meta-programming.
Lambdulus: Teaching Lambda Calculus Practically
Authors
Máj, P.; Sliacký, J.
Year
2019
Published
SPLASH-E 2019: Proceedings of the 2019 ACM SIGPLAN Symposium on SPLASH-E. New York: ACM, 2019. p. 57-65. ISBN 9781450369893.
Type
Proceedings paper
Departments
Annotation
λ calculus is a great formal introduction to functional programming. However, its abstract nature poses a challenge for many students as they struggle both with the unfamiliarity of functional programming and with the high abstraction and minimalism of λ calculus. As a result, functional programming classes are often delayed to older students who are hoped to be better prepared for appreciating its qualities, as was the case at the authors’ university as well. 3 years ago, as part of a redesigned curriculum around programming languages education, we have decided to introduce students to various programming paradigms much sooner than before. In doing so, we faced the problem of explaining very theoretical foundational concepts to very young students in a very short time. We have monitored the achievements and shortcomings of the new course over the past years and as a result of our findings have developed Lambdulus, an interactive and visual evaluator of λ calculus expressions that encourages students to explore the mechanisms of λ calculus by treating it not as a theoretical concept, but as a programming language in its own right.
On Julia's Efficient Algorithm for Subtyping Union Types and Covariant Tuples (Artifact).
Authors
Vitek, J.; Nardelli, F.; Chung, B.
Year
2019
Published
33rd European Conference on Object-Oriented Programming (ECOOP 2019). Wadern: Schloss Dagstuhl - Leibniz Center for Informatics, 2019. p. 1-24. ISSN 1868-8969. ISBN 978-3-95977-111-5.
Type
Proceedings paper
Departments
Annotation
The Julia programming language supports multiple dispatch and provides a rich type annotation
language to specify method applicability. When multiple methods are applicable for a given call,
Julia relies on subtyping between method signatures to pick the correct method to invoke. Julia’s
subtyping algorithm is surprisingly complex, and determining whether it is correct remains an open
question. In this paper, we focus on one piece of this problem: the interaction between union
types and covariant tuples. Previous work normalized unions inside tuples to disjunctive normal
form. However, this strategy has two drawbacks: complex type signatures induce space explosion,
and interference between normalization and other features of Julia’s type system. In this paper,
we describe the algorithm that Julia uses to compute subtyping between tuples and unions – an
algorithm that is immune to space explosion and plays well with other features of the language. We
prove this algorithm correct and complete against a semantic-subtyping denotational model in Coq
Pattern-Based S-Expression Rewriting in Emacs
Authors
Culpepper, R.
Year
2019
Published
Proceedings of the 12th European Lisp Symposium (ELS 2019). New York: ACM, 2019. p. 8-10. ISSN 2677-3465. ISBN 978-2-9557474-3-8.
Type
Proceedings paper
Departments
Annotation
sexp-rewrite is an Emacs library for doing pattern-based rewriting of S-expression-structured code—ie, code in Lisp, Scheme, and
Racket. The library provides a simple but powerful pattern language
that enables users to define rewriting rules (called “tactics”) and
auxiliary nonterminals.
Scala Implicits are Everywhere: A large-scale study of the use of Implicits in the wild
Authors
Křikava, F.; Vitek, J.; Miller, H.
Year
2019
Published
Volume 3, Issue OOPSLA October 2019. New York: ACM, 2019. p. 1-28. ISSN 2475-1421.
Type
Proceedings paper
DOI
Departments
Annotation
The Scala programming language offers two distinctive language features implicit parameters and implicit conversions, often referred together as implicits. Announced without fanfare in 2004, implicits have quickly grown to become a widely and pervasively used feature of the language. They provide a way to reduce the boilerplate code in Scala programs. They are also used to implement certain language features without having to modify the compiler. We report on a large-scale study of the use of implicits in the wild. For this, we analyzed 7,280 Scala projects hosted on GitHub, spanning over 8.1M call sites involving implicits and 370.7K implicit declarations across 18.7M lines of Scala code.
Syntax Templates in Racket
Authors
Culpepper, R.
Year
2019
Published
Scheme and Functional Programming Workshop (Scheme '19). Birmingham: University of Alabama at Birmingham, 2019.
Type
Proceedings paper
Departments
Annotation
One foundation of Scheme macro systems is the notation of syntax patterns and templates. Patterns are used to match and destructure terms, and templates are used to construct new terms. This paper presents extensions to Racket’s template notation to support template splicing, conditional generation, and template metafunctions. The new template features complement syntax-parse’s previous extensions to the pattern-matching notation.
Online Abstraction with MDP Homomorphisms for Deep Learning
Authors
Bíža, O.; Platt, R.
Year
2019
Published
Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. New York: ACM, 2019. p. 1125-1133. ISSN 2523-5699. ISBN 978-1-4503-6309-9.
Type
Proceedings paper
Departments
Annotation
Abstraction of Markov Decision Processes is a useful tool for solving complex problems, as it can ignore unimportant aspects of an environment, simplifying the process of learning an optimal policy. In this paper, we propose a new algorithm for finding abstractMDPs in environments with continuous state spaces. It is based on MDP homomorphisms, a structure-preserving mapping betweenMDPs. We demonstrate our algorithm’s ability to learn abstractions from collected experience and show how to reuse the abstractions to guide exploration in new tasks the agent encounters. Our novel task transfer method outperforms baselines based on a deep Q-network in the majority of our experiments. The source code is at https://github.com/ondrejba/aamas_19.
From Macros to DSLs: The Evolution of Racket
Authors
Culpepper, R.; Felleisen, M.; Flatt, M.; Krishnamurthi, S.
Year
2019
Published
3rd Summit on Advances in Programming Languages (SNAPL 2019). Saarbrücken: Dagstuhl Publishing,, 2019. p. 1-19. ISSN 1868-8969. ISBN 978-3-95977-113-9.
Type
Proceedings paper
Departments
Annotation
The Racket language promotes a language-oriented style of programming. Developers create many domain-specific languages, write programs in them, and compose these programs via Racket code. This style of programming can work only if creating and composing little languages is simple and effective. While Racket’s Lisp heritage might suggest that macros suffice, its design team discovered significant shortcomings and had to improve them in many ways. This paper presents the evolution of Racket’s macro system, including a false start, and assesses its current state.
R melts brains: an IR for first-class environments and lazy effectful arguments
Authors
Flückiger, O.; Chari, G.; Ječmen, J.; Yee, M.; Hain, J.; Vitek, J.
Year
2019
Published
Proceedings of the 15th ACM SIGPLAN International Symposium on Dynamic Languages. ACM New York, 2019. p. 55-66. ISBN 978-1-4503-6996-1.
Type
Proceedings paper
Annotation
The R programming language combines a number of features considered hard to analyze and implement efficiently: dynamic typing, reflection, lazy evaluation, vectorized primitive types, first-class closures, and extensive use of native code. Additionally, variable scopes are reified at runtime as first-class environments. The combination of these features renders most static program analysis techniques impractical, and thus, compiler optimizations based on them ineffective. We present our work on PIR, an intermediate representation with explicit support for first-class environments and effectful lazy evaluation. We describe two dataflow analyses on PIR: the first enables reasoning about variables and their environments, and the second infers where arguments are evaluated. Leveraging their results, we show how to elide environment creation and inline functions.
On the Impact of Programming Languages on Code Quality: A Reproduction Study
Type
Article
DOI
Annotation
In a 2014 article, Ray, Posnett, Devanbu, and Filkov claimed to have uncovered a statistically significant association between 11 programming languages and software defects in 729 projects hosted on GitHub. Specifically, their work answered four research questions relating to software defects and programming languages. With data and code provided by the authors, the present article first attempts to conduct an experimental repetition of the original study. The repetition is only partially successful, due to missing code and issues with the classification of languages. The second part of this work focuses on their main claim, the association between bugs and languages, and performs a complete, independent reanalysis of the data and of the statistical modeling steps undertaken by Ray et al. in 2014. This reanalysis uncovers a number of serious flaws that reduce the number of languages with an association with defects down from 11 to only 4. Moreover, the practical effect size is exceedingly small. These results thus undermine the conclusions of the original study. Correcting the record is important, as many subsequent works have cited the 2014 article and have asserted, without evidence, a causal link between the choice of programming language for a given task and the number of software defects. Causation is not supported by the data at hand; and, in our opinion, even after fixing the methodological flaws we uncovered, too many unaccounted sources of bias remain to hope for a meaningful comparison of bug rates across languages.
Can android run on time? Extending and measuring the android platform's timeliness
Authors
Yan, Y.; Gokul, G.; Dantu, K.; Ko, S.Y.; Vitek, J.; ZIAREK, L.
Year
2019
Published
ACM Transactions on Embedded Computing Systems. 2019, 17(6), ISSN 1539-9087.
Type
Article
DOI
Departments
Annotation
Time predictability is dificult to achieve in the complex, layered execution environments that are common inmodern embedded devices such as smartphones. We explore adopting the Android programming model for arange of embedded applications that extends beyond mobile devices, under the constraint that changes towidely used libraries should be minimized. The challenges we explore include: the interplay between real-timeactivities and the rest of the system, how to express the timeliness requirements of components, and how wellthose requirements can be met on stock embedded platforms. We detail the design and implementation of ourmodifications to the Android framework along with a real-time VM and OS, and provide experimental datavalidating feasibility over five applications.
How to Evaluate the Performance of Gradual Type Systems
Authors
Vitek, J.; Greeman, B.; Takikawa, A.; New, M.S.; Feltey, D.; Findler, R.B.; Felleisen, M.
Year
2019
Published
JOURNAL OF FUNCTIONAL PROGRAMMING. 2019, 29 ISSN 0956-7968.
Type
Article
Departments
Annotation
A sound gradual type system ensures that untyped components of a program can never break the guarantees of statically typed components. This assurance relies on runtime checks, which in turn impose performance overhead in proportion to the frequency and nature of interaction between typed and untyped components. The literature on gradual typing lacks rigorous descriptions of methods for measuring the performance of gradual type systems. This gap has consequences for the implementors of gradual type systems and developers who use such systems. Without systematic evaluation of mixed-typed programs, implementors cannot precisely determine how improvements to a gradual type system affect performance. Developers cannot predict whether adding types to part of a program will significantly degrade (or improve) its performance. This paper presents the first method for evaluating the performance of sound gradual type systems. The method quantifies both the absolute performance of a gradual type system and the relative performance of two implementations of the same gradual type system. To validate the method, the paper reports on its application to twenty programs and three implementations of Typed Racket.
Feature-specific profiling
Authors
Vitek, J.; Andersen, L.; St-Amour, V.; Felleisen, M.
Year
2019
Published
ACM Transactions on Programming Languages and Systems. 2019, 41(1), ISSN 0164-0925.
Type
Article
DOI
Departments
Annotation
While high-level languages come with significant readability and maintainability benefits, their performance remains difficult to predict. For example, programmers may unknowingly use language features inappropriately, which cause their programs to run slower than expected. To address this issue, we introduce feature-specific profiling, a technique that reports performance costs in terms of linguistic constructs. Feature-specific profilers help programmers find expensive uses of specific features of their language. We describe the architecture of a profiler that implements our approach, explain prototypes of the profiler for two languages with different characteristics and implementation strategies, and provide empirical evidence for the approach's general usefulness as a performance debugging tool. © 2018 Copyright held by the owner/author(s).
Julia: Dynamism and Performance Reconciled by Design
Authors
Vitek, J.; Bezanson, J.; Chung, B.; Karpinski, S.; Shah, V.B.; Zoubritski, L.
Year
2018
Published
Journal Proceedings of the ACM on Programming Languages,Volume 2 Issue, OOPSLA. New York: ACM, 2018. ISSN 2475-1421.
Type
Proceedings paper
DOI
Departments
Annotation
Julia is a programming language for the scientific community that combines features of productivity languages, such as Python or MATLAB, with characteristics of performance-oriented languages, such as C++ or Fortran. Julia's productivity features include: dynamic typing, automatic memory management, rich type annotations, and multiple dispatch. At the same time, Julia allows programmers to control memory layout and leverages a specializing just-in-time compiler to eliminate much of the overhead of those features. This paper details the design choices made by the creators of Julia and reflects on the implications of those choices for performance and usability.
Julia Subtyping: A Rational Reconstruction
Authors
Vitek, J.; Chung, B.; Belyakova, Y.; Pelenitsyn, A.; Bezanson, J.; Nardelli, F.Z.
Year
2018
Published
Journal Proceedings of the ACM on Programming Languages,Volume 2 Issue, OOPSLA. New York: ACM, 2018. ISSN 2475-1421.
Type
Proceedings paper
DOI
Departments
Annotation
Programming languages that support multiple dispatch rely on an expressive notion of subtyping to specify method applicability. In these languages, type annotations on method declarations are used to select, out of a potentially large set of methods, the one that is most appropriate for a particular tuple of arguments. Julia is a language for scientific computing built around multiple dispatch and an expressive subtyping relation. This paper provides the first formal definition of Julia's subtype relation and motivates its design. We validate our specification empirically with an implementation of our definition that we compare against the existing Julia implementation on a collection of real-world programs. Our subtype implementation differs on 122 subtype tests out of 6014476. The first 120 differences are due to a bug in Julia that was fixed once reported; the remaining 2 are under discussion.
Self-contained development environments
Authors
Chari, G.; Vitek, J.; Pimás, J.; Flückiger, O.
Year
2018
Published
DLS 2018 Proceedings of the 14th ACM SIGPLAN International Symposium on Dynamic Languages. New York: ACM, 2018. p. 76-87. ISBN 9781450360302.
Type
Proceedings paper
Departments
Annotation
Operating systems are traditionally implemented in lowlevel, performance-oriented programming languages. These languages typically rely on minimal runtime support and
provide unfettered access to the underlying hardware. Tradition has benefits: developers control the resources that the operating system manages and few performance bottlenecks cannot be overcome with clever feats of programming.
On the other hand, this makes operating systems harder to understand and maintain. Furthermore, those languages
have few built-in barriers against bugs. This paper is an experiment in side-stepping operating systems, and pushing
functionality into the runtime of high-level programming
languages. The question we try to answer is how much support is needed to run an application written in, say, Smalltalk
or Python on bare metal, that is, with no underlying operating system. We present a framework named NopSys that
allows this, and we validate it with the implementation of
CogNos a Smalltalk virtual machine running on bare x86
hardware. Experimental results suggest that this approach
is promising.
Contextual Equivalence fo probabilistic languages
Authors
Culpepper, R.; Wand, M.; Cobb, A.; Giannakopoulos, T.
Year
2018
Published
Journal Proceedings of the ACM on Programming Languages,Volume 2, Issue ICFP. New York: ACM, 2018. ISSN 2475-1421.
Type
Proceedings paper
DOI
Departments
Annotation
We present a complete reasoning principle for contextual equivalence in an untyped probabilistic language. The language includes continuous (real-valued) random variables, conditionals, and scoring. It also includes recursion, since the standard call-by-value fixpoint combinator is expressible.
We demonstrate the usability of our characterization by proving several equivalence schemas, including familiar facts from lambda calculus as well as results specific to probabilistic programming. In particular, we use it to prove that reordering the random draws in a probabilistic program preserves contextual equivalence. This allows us to show, for example, that
(let x = e1 in let y = e2 in e0) =ctx (let y = e2 in let x = e1 in e0)
(provided x does not occur free in e2 and y does not occur free in e1) despite the fact that e1 and e2 may have sampling and scoring effects.
Monotonic gradual typing in a common calculus
Authors
Chung, B.; Vitek, J.
Year
2018
Published
Proceeding ISSTA '18 Companion Proceedings for the ISSTA/ECOOP 2018 Workshops. New York: ACM, 2018. p. 17-23. ISBN 978-1-4503-5939-9.
Type
Proceedings paper
Departments
Annotation
Gradual typing refers to the notion that programs can be incrementally decorated with type annotations. Languages that support this approach to software development allow for programs being in various states of "typedness" on a scale ranging from entirely untyped to fully statically typed. Points in the middle of this typed-untyped scale create interactions between typed and untyped code, which is where gradual type systems differ. Each gradual type system comes with tradeoffs. Some systems provide strong guarantees at the expense of vastly degraded performance; others do not impact the running time of programs, but they do little to prevent type errors. This paper looks at an intriguing point in the landscape of these systems: the monotonic semantics. The monotonic semantics is designed to reduce the overhead of typed field access through a different enforcement mechanism compared to other gradual type systems. In our previous paper, [1], we described four semantics for gradual typing. This paper uses the framework of that companion paper to present and explore a formulation for the monotonic semantics. In comparison to the others, the monotonic semantics is designed to reduce the overhead of typed field access. We translate a common gradually typed source language to a common statically typed target language according to the monotonic semantics, allowing easy comparison to other gradual type systems in our framework.
Tests from traces: Automated unit test extraction for R
Authors
Year
2018
Published
ISSTA 2018 - Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM PRESS, 2018. p. 232-241. ISBN 978-1-4503-5699-2.
Type
Proceedings paper
Departments
Annotation
Unit tests are labor-intensive to write and maintain. This paper looks into how well unit tests for a target software package can be extracted from the execution traces of client code. Our objective is to reduce the effort involved in creating test suites while minimizing the number and size of individual tests, and maximizing coverage. To evaluate the viability of our approach, we select a challenging target for automated test extraction, namely R, a programming language that is popular for data science applications. The challenges presented by R are its extreme dynamism, coerciveness, and lack of types. This combination decrease the efficacy of traditional test extraction techniques. We present Genthat, a tool developed over the last couple of years to non-invasively record execution traces of R programs and extract unit tests from those traces. We have carried out an evaluation on 1,545 packages comprising 1.7M lines of code. The tests extracted by Genthat improved code coverage from the original rather low value of 267,496 lines to 700,918 lines. The running time of the generated tests is 1.9 times faster than the code they came from. © 2018 Association for Computing Machinery.
KafKa: Gradual typing for objects
Authors
Chung, B.; Li, P.; Nardelli, F.Z.; Vitek, J.
Year
2018
Published
Leibniz International Proceedings in Informatics, LIPIcs. Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, 2018. ISBN 978-3-95977-079-8.
Type
Proceedings paper
Departments
Annotation
A wide range of gradual type systems have been proposed, providing many languages with the ability to mix typed and untyped code. However, hiding under language details, these gradual type systems embody fundamentally different ideas of what it means to be well-typed. In this paper, we show that four of the most common gradual type systems provide distinct guarantees, and we give a formal framework for comparing gradual type systems for object-oriented languages. First, we show that the different gradual type systems are practically distinguishable via a three-part litmus test. We present a formal framework for defining and comparing gradual type systems. Within this framework, different gradual type systems become translations between a common source and target language, allowing for direct comparison of semantics and guarantees. © Benjamin Chung, Paley Li, Francesco Zappa Nardelli, and Jan Vitek.
Correctness of a concurrent object collector for actor languages
Authors
Franco, J.; Clebsch, S.; Drossopoulou, S.; Vitek, J.
Year
2018
Published
Lecture Notes in Computer Science book series. Düsseldorf: Springer-VDI-Verlag, 2018. p. 885-911. ISBN 978-3-319-89883-4.
Type
Proceedings paper
Departments
Annotation
ORCA is a garbage collection protocol for actor-based programs. Multiple actors may mutate the heap while the collector is running without any dedicated synchronisation. ORCA is applicable to any actor language whose type system prevents data races and which supports causal message delivery. We present a model of ORCA which is parametric to the host language and its type system. We describe the interplay between the host language and the collector. We give invariants preserved by ORCA, and prove its soundness and completeness. © The Author(s) 2018.
Correctness of speculative optimizations with dynamic deoptimization
Authors
Vitek, J.; Fluckiger, O.; Scherer, G.; Yee, M.; Goel, A.; Ahmed, A.
Year
2018
Published
Proceedings of the ACM on Programming Languages (PACMPL). 2018, 2 ISSN 2475-1421.
Type
Article
DOI
Departments
Annotation
High-performance dynamic language implementations make heavy use of speculative optimizations to achieve speeds close to statically compiled languages. These optimizations are typically performed by a just-in-time compiler that generates code under a set of assumptions about the state of the program and its environment. In certain cases, a program may execute code compiled under assumptions that are no longer valid. The implementation must then deoptimize the program on-the-fly; this entails finding semantically equivalent code that does not rely on invalid assumptions, translating program state to that expected by the target code, and transferring control. This paper looks at the interaction between optimization and deoptimization, and shows that reasoning about speculation is surprisingly easy when assumptions are made explicit in the program representation. This insight is demonstrated on a compiler intermediate representation, named sourir, modeled after the high-level representation for a dynamic language. Traditional compiler optimizations such as constant folding, unreachable code elimination, and function inlining are shown to be correct in the presence of assumptions. Furthermore, the paper establishes the correctness of compiler transformations specific to deoptimization: namely unrestricted deoptimization, predicate hoisting, and assume composition.
Parallelizing Julia with a Non-invasive DSL
Authors
Anderson, T A; Liu, H; Kuper, L; Totoni, E; Vitek, J.; Shpeisman, T
Year
2017
Published
31st European Conference on Object-Oriented Programming (ECOOP 2017). Wadern: Schloss Dagstuhl - Leibniz Center for Informatics, 2017. p. 41-429. ISSN 1868-8969. ISBN 9783959770354.
Type
Proceedings paper
Departments
Annotation
Computational scientists often prototype software using productivity languages that offer high-level programming abstractions. When higher performance is needed, they are obliged to rewrite their code in a lower-level efficiency language. Different solutions have been proposed to address this trade-off between productivity and efficiency. One promising approach is to create embedded domain-specific languages that sacrifice generality for productivity and performance, but practical experience with DSLs points to some road blocks preventing widespread adoption. This paper proposes a non-invasive domain-specific language that makes as few visible changes to the host programming model as possible. We present ParallelAccelerator, a library and compiler for high-level, high-performance scientific computing in Julia. ParallelAccelerator's programming model is aligned with existing Julia programming idioms. Our compiler exposes the implicit parallelism in high-level array-style programs and compiles them to fast, parallel native code. Programs can also run in "library-only" mode, letting users benefit from the full Julia environment and libraries. Our results show encouraging performance improvements with very few changes to source code required. In particular, few to no additional type annotations are necessary.
Hadoop-Benchmark: Rapid Prototyping and Evaluation of Self-Adaptive Behaviors in Hadoop Clusters
Authors
Zhang, B.; Křikava, F.; Rouvoy, R.; Senturier, L.
Year
2017
Published
2017 IEEE/ACM 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS). USA: IEEE Computer Society, 2017. p. 175-181. ISBN 978-1-5386-1550-8.
Type
Proceedings paper
Departments
Annotation
Optimizing Hadoop executions has attracted a lot
of research contributions in particular in the domain of self-
adaptive software systems. However, these research efforts are
often hindered by the complexity of Hadoop operation and the
difficulty to reproduce experimental evaluations that makes it
hard to compare different approaches to one another.
To address this limitation, we propose a research acceleration
platform for rapid prototyping and evaluation of self-adaptive
behavior in Hadoop clusters. Essentially, it provides automated
approach to provision reproducible Hadoop environments and
execute acknowledged benchmarks. It is based on the state-
of-the-art container technology that supports both distributed
configurations as well as standalone single-host setups. We
demonstrate the approach on a complete implementation of a
concrete Hadoop self-adaptive case study.
Verifying a Concurrent Garbage Collector using a Rely-Guarantee Methodology
Authors
Zakowski, Y; Cachera, D; Demange, D; Petri, G; Pichardie, D; Jagannathan, S; Vitek, J.
Year
2017
Published
International Conference on Interactive Theorem Prooving (ITP). Berlin: Springer-Verlag, 2017.
Type
Proceedings paper
Departments
Making Android Run on Time
Authors
Yan, Y; Dantu, K; Ko, S Y; Ziarek, L; Vitek, J.
Year
2017
Published
Real-Time and Embedded technology and Applications Symposium (RTAS 2017). San Francisco: American Institute of Physics and Magnetic Society of the IEEE, 2017.
Type
Proceedings paper
Departments
Helenos: A Realistic Benchmark for Distributed Transactional Memory
Authors
Kobyliński, P.; Siek, K.; Baranowski, J.; Wojciechowski, P.
Year
2017
Published
Software - Practice and Experience. 2017, 48(3), 528-549. ISSN 0038-0644.
Contact person
Ing. Dana Tomášková
Bursar of the Programming Research Lab
- +420224357971
- dana.tomaskova@fit.cvut.cz
- TH:A-1250
Where to find us
Programming Research Laboratory
Faculty of Information Technology
Czech Technical University in Prague
Rooms TH:A-1250, 1252, 1254
(Building A, 12th floor)
Thákurova 7
160 00 Prague 6 – Dejvice