Laboratoř výzkumu programování (PRL@PRG)

V Laboratoři výzkumu programování (PRL@PRG) se zabýváme hlubším pochopením programovacích jazyků a jejich využitím. Laboratoř vede prof. Christoph Kirsch z University of Salzburg v Rakousku.

Více o nás

Čemu se věnujeme?

Co máme za sebou?

Jak odhalit datové chyby v programovacích jazycích? Výzkum našeho vědce získal ocenění Seal of Excellence

Výzkum programovacích jazyků na FIT vede prof. Kirsch

Všechny příběhy

Proč jsou aktualizace programovacích jazyků tak drahé a jak je zlevnit? Naši vědci z FIT ČVUT mají řešení

Vedoucí pracoviště

prof. Christoph Kirsch

Vedoucí Laboratoře výzkumu programování

Tajemnice

Ing. Dana Tomášková

Tajemnice Laboratoře výzkumu programování

Vědečtí pracovníci

prof. Jan Vitek, MSc., Ph.D.

Pierre Donat-Bouillud, Ph.D.

Sebastián Krynski, MSc.

doc. Ing. Filip Křikava, Ph.D.

Ing. Petr Máj, Ph.D.

Činnost laboratoře se dotýká těchto témat

Výzkum programovacích jazyků

V rámci projektu BigCode vytvoříme na naší fakultě Institut pro škálovatelnou analýzu kódu (ISCA). Půjde o první vědecko-výzkumný projekt v ČR zaměřený na analýzu rozsáhlých bází programových kódů na Internetu.

Projekty

Rigorous Engineering of Data Analysis Pipelines (RiGiD)

Program

Grantové projekty excelence v základním výzkumu EXPRO

Poskytovatel

Grantová agentura České republiky

Pracoviště

Laboratoř výzkumu programování
Oddělení pro vědu a výzkum

Řešitelé

prof. Jan Vitek, MSc., Ph.D.

Kód

GX23-07580X

Období

2023 - 2027

Popis

The RiGiD project lays the groundwork for this research programme and aims to develop a methodology for rigorous engineering of data analysis pipelines that can be adopted in practice. Our approach is pragmatic. Rather than chasing functional correctness, we hope to substantially reduce the incidence of errors in the wild. The research is structured in three overlapping chapters. First, identify the problem by carrying out user studies and large-scale program analysis of a corpus of over 100,000 data science pipelines. The outcome will be a catalog of error patterns as well as a labeled dataset to be shared with other researchers. The technical advances will focus on combining dynamic and static program analysis to approximate the behavior of partial programs and programs written in highly dynamic languages. The second part of our effort proposes a methodology and tooling for developing data sciences codes with reduced error rates. The technical contributions of this part of the project focus on lightweight specification techniques and, in particular, the development of a novel gradual typing system that deals with common programming idioms found in our corpus. This includes various forms of object orientation, data frames, and rich value specifications. These specifications are complemented with an automated test generation technique that combines test and input synthesis with fuzzing and test minimization. Finally, the execution environment is extended to support automatic reproducibility and result audits through data lineage. The third and last part of the work evaluates the proposal by conducting user studies and developing tools for automating deployment. The contribution will be a qualitative and quantitative assessment of the RiGiD methodology and tooling. The technical contribution will be tools that leverage program analysis to infer approximate specifications to assist deployment and adoption. Our tools target R, a language for data analytics with 2 milli

Evolving Language Ecosystems

Program

Horizon 2020

Poskytovatel

Evropská komise

Pracoviště

Katedra teoretické informatiky
Laboratoř výzkumu programování
Oddělení pro rozvoj

Řešitelé

prof. Jan Vitek, MSc., Ph.D.

Kód

695412

Období

2016 - 2022

Popis

The Evolving Language Ecosystems project explores the fundamental techniques and algorithms for evolving programming languages and their ecosystems. Our purpose is to reduce the cost of wide-ranging language changes and obviate the need for devising entirely new languages. Our findings will grant both researchers and practitioners a greater degree of freedom when experimenting with new ideas on how to express computation.

Všechny projekty

Big Code: Škálovatelná analýza rozsáhlých bází programů

Program

OPVVV - Operační program Výzkum, vývoj a vzdělávání - Strukturální fondy EU

Poskytovatel

Evropská komise

Pracoviště

Katedra aplikované matematiky
Laboratoř výzkumu programování

Řešitelé

prof. Jan Vitek, MSc., Ph.D.

Kód

EF15_003/0000421, CZ.02.1.01/0.0/0.0/15_003/0000421

Období

2019 - 2022

Popis

V rámci projektu BigCode bude na Českém vysokém učení technickém v Praze, Fakultě informačních technologií (FIT) vytvořen Institut pro škálovatelnou analýzu kódu (ISCA), první výzkumné centrum v ČR zaměřené na analýzu rozsáhlých bází programových kódů na internetu, které představují obrovský znalostní potenciál, který ale neumíme zatím využít. Projekt BigCode si klade za cíl tuto bázi pomocí technik programovacích jazyků a statistického strojového učení zanalyzovat a umožnit porozumění získaným informacím. Požadované prostředky budou použity na vybudování hardwarové a softwarové infrastruktury centra a na mzdové prostředky pro mezinárodně uznávané výzkumné pracovníky. Výzkumná náplň projektu je v souladu s již existujícím výzkumem na FIT -- softwarové a znalostní inženýrství, vytěžování dat, paralelní výpočty. Díky mezinárodním kontaktům členů výzkumného týmu projektu BigCode centrum následně předpokládá získávání podpory od předních softwarových firem, jako jsou Google a Oracle.

Publikace

Deoptless: Speculation with Dispatched On-Stack Replacement and Specialized Continuations

Autoři

Flückiger, O.; Ječmen, J.; Krynski, S.; Vitek, J.

Rok

2022

Publikováno

PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation. New York: Association for Computing Machinery, 2022. p. 749-761. ISBN 978-1-4503-9265-5.

Typ

Stať ve sborníku

DOI

10.1145/3519939.3523729

Pracoviště

Katedra teoretické informatiky
Laboratoř výzkumu programování

Anotace

Just-in-time compilation provides significant performance improvements for programs written in dynamic languages. These benefits come from the ability of the compiler to speculate about likely cases and generate optimized code for these. Unavoidably, speculations sometimes fail and the optimizations must be reverted. In some pathological cases, this can leave the program stuck with suboptimal code. In this paper we propose deoptless, a technique that replaces deoptimization points with dispatched specialized continuations. The goal of deoptless is to take a step towards providing users with a more transparent performance model in which mysterious slowdowns are less frequent and grave.

Type stability in Julia: Avoiding performance pathologies in JIT compilation

Autoři

Pelenitsyn, A.; Belyakova, J.; Chung, B.; Tate, R.; Vitek, J.

Rok

2021

Publikováno

Proceedings of the ACM on Programming Languages (PACMPL). 2021, 5 1-26. ISSN 2475-1421.

Typ

Článek

DOI

10.1145/3485527

Pracoviště

Laboratoř výzkumu programování

Anotace

As a scientific programming language, Julia strives for performance but also provides high-level productivity features. To avoid performance pathologies, Julia users are expected to adhere to a coding discipline that enables so-called type stability. Informally, a function is type stable if the type of the output depends only on the types of the inputs, not their values. This paper provides a formal definition of type stability as well as a stronger property of type groundedness, shows that groundedness enables compiler optimizations, and proves the compiler correct. We also perform a corpus analysis to uncover how these type-related properties manifest in practice.

Všechny publikace

Promises Are Made To Be Broken: Migrating R to Strict Semantics

Autoři

Goel, A.; Ječmen, J.; Krynski, S.; Flückiger, O.; Vitek, J.

Rok

2021

Publikováno

Proceedings of the ACM on Programming Languages (PACMPL). 2021, 5(OOPSLA), 1-20. ISSN 2475-1421.

Typ

Článek

DOI

10.1145/3485478

Pracoviště

Katedra teoretické informatiky
Laboratoř výzkumu programování

Anotace

Function calls in the R language do not evaluate their arguments, these are passed to the callee as suspended computations and evaluated if needed. After 25 years of experience with the language, there are very few cases where programmers leverage delayed evaluation intentionally and laziness comes at a price in performance and complexity. This paper explores how to evolve the semantics of a lazy language towards strictness-by-default and laziness-on-demand. To provide a migration path, it is necessary to provide tooling for developers to migrate libraries without introducing errors. This paper reports on a dynamic analysis that infers strictness signatures for functions to capture both intentional and accidental laziness. Over 99% of the inferred signatures were correct when tested against clients of the libraries.

What we eval in the shadows: A large-scale study of eval in R programs

Autoři

Goel, A.; Donat-Bouillud, P.; Křikava, F.; Kirsch, C.; Vitek, J.

Rok

2021

Publikováno

Proceedings of the ACM on Programming Languages (PACMPL). 2021, 5(OOPSLA), ISSN 2475-1421.

Typ

Článek

DOI

10.1145/3485502

Pracoviště

Laboratoř výzkumu programování

Anotace

Most dynamic languages allow users to turn text into code using various functions, often named eval, with language-dependent semantics. The widespread use of these reflective functions hinders static analysis and prevents compilers from performing optimizations. This paper aims to provide a better sense of why programmers use eval. Understanding why eval is used in practice is key to finding ways to mitigate its negative impact. We have reasons to believe that reflective feature usage is language and application domain-specific; we focus on data science code written in R and compare our results to previous work that analyzed web programming in JavaScript. We analyze 49,296,059 calls to eval from 240,327 scripts extracted from 15,401 R packages. We find that eval is indeed in widespread use; R's eval is more pervasive and arguably dangerous than what was previously reported for JavaScript.

First-class environments in R

Autoři

Goel, A.; Vitek, J.

Rok

2021

Publikováno

DLS 2021: Proceedings of the 17th ACM SIGPLAN International Symposium on Dynamic Languages. New York: ACM, 2021. p. 12-22. ISBN 9781450391054.

Typ

Stať ve sborníku

DOI

10.1145/3486602.3486768

Pracoviště

Laboratoř výzkumu programování

Anotace

The R programming language is widely used for statistical computing. To enable interactive data exploration and rapid prototyping, R encourages a dynamic programming style. This programming style is supported by features such as first-class environments. Amongst widely used languages, R has the richest interface for programmatically manipulating environments. With the flexibility afforded by reflective operations on first-class environments, come significant challenges for reasoning and optimizing user-defined code. This paper documents the reflective interface used to operate over first-class environment. We explain the rationale behind its design and conduct a large-scale study of how the interface is used in popular libraries. © 2021 Owner/Author.

Formally verified speculation and deoptimization in a JIT compiler

Autoři

Barrière, A.; Blazy, S.; Flückiger, O.; Pichardie, D.; Vitek, J.

Rok

2021

Publikováno

Proceedings of the ACM on Programming Languages (PACMPL). 2021, 5(POPL), 1-26. ISSN 2475-1421.

Typ

Článek

DOI

10.1145/3434327

Pracoviště

Laboratoř výzkumu programování

Anotace

Just-in-time compilers for dynamic languages routinely generate code under assumptions that may be invalidated at run-time, this allows for specialization of program code to the common case in order to avoid unnecessary overheads due to uncommon cases. This form of software speculation requires support for deoptimization when some of the assumptions fail to hold. This paper presents a model just-in-time compiler with an intermediate representation that explicits the synchronization points used for deoptimization and the assumptions made by the compiler's speculation. We also present several common compiler optimizations that can leverage speculation to generate improved code. The optimizations are proved correct with the help of a proof assistant. While our work stops short of proving native code generation, we demonstrate how one could use the verified optimization to obtain significant speed ups in an end-to-end setting. © 2021 Owner/Author.

Contextual Dispatch for Function Specialization

Autoři

Flückiger, O.; Chari, G.; Yee, M.-H.; Ječmen, J.; Vitek, J.

Rok

2020

Publikováno

Proceedings of the ACM on Programming Languages (PACMPL). 2020, 4(OOPSLA), 1-24. ISSN 2475-1421.

Typ

Článek

DOI

10.1145/3428288

Pracoviště

Katedra teoretické informatiky
Laboratoř výzkumu programování

Anotace

In order to generate efficient code, dynamic language compilers often need information, such as dynamic types, not readily available in the program source. Leveraging a mixture of static and dynamic information, these compilers speculate on the missing information. Within one compilation unit, they specialize the generated code to the previously observed behaviors, betting that past is prologue. When speculation fails, the execution must jump back to unoptimized code. In this paper, we propose an approach to further the specialization, by disentangling classes of behaviors into separate optimization units. With contextual dispatch, functions are versioned and each version is compiled under different assumptions. When a function is invoked, the implementation dispatches to a version optimized under assumptions matching the dynamic context of the call. As a proof-of-concept, we describe a compiler for the R language which uses this approach. Our implementation is, on average, 1.7× faster than the GNU R reference implementation. We evaluate contextual dispatch on a set of benchmarks and measure additional speedup, on top of traditional speculation with deoptimization techniques. In this setting contextual dispatch improves the performance of 18 out of 46 programs in our benchmark suite.

Sampling optimized code for type feedback

Autoři

Flückiger, O.; Wälchli, A.; Krynski, S.; Vitek, J.

Rok

2020

Publikováno

DSL_Proceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages. New York: ACM, 2020. p. 99-111. ISBN 978-1-4503-8175-8.

Typ

Stať ve sborníku

DOI

10.1145/3426422.3426984

Pracoviště

Laboratoř výzkumu programování

Anotace

To efficiently execute dynamically typed languages, many language implementations have adopted a two-tier architecture. The first tier aims for low-latency startup times and collects dynamic profiles, such as the dynamic types of variables. The second tier provides high-throughput using an optimizing compiler that specializes code to the recorded type information. If the program behavior changes to the point that not previously seen types occur in specialized code, that specialized code becomes invalid, it is deoptimized, and control is transferred back to the first tier execution engine which will start specializing anew. However, if the program behavior becomes more specific, for instance, if a polymorphic variable becomes monomorphic, nothing changes. Once the program is running optimized code, there are no means to notice that an opportunity for optimization has been missed. We propose to employ a sampling-based profiler to monitor native code without any instrumentation. The absence of instrumentation means that when the profiler is not active, no overhead is incurred. We present an implementation is in the context of the A just-in-time, optimizing compiler for the R language. Based on the sampled profiles, we are able to detect when the native code produced by A is specialized for stale type feedback and recompile it to more type-specific code. We show that sampling adds an overhead of less than 3 © 2020 ACM.

World age in Julia: Optimizing method dispatch in the presence of eval

Autoři

Belyakova, J.; Chung, B.; Gelinas, J.; Nash, J.; Vitek, J.

Rok

2020

Publikováno

Proceedings of the ACM on Programming Languages (PACMPL). 2020, 4(OOPSLA), 1-26. ISSN 2475-1421.

Typ

Článek

DOI

10.1145/3428275

Pracoviště

Laboratoř výzkumu programování

Anotace

Dynamic programming languages face semantic and performance challenges in the presence of features, such as eval, that can inject new code into a running program. The Julia programming language introduces the novel concept of world age to insulate optimized code from one of the most disruptive side-effects of eval: changes to the definition of an existing function. This paper provides the first formal semantics of world age in a core calculus named juliette, and shows how world age enables compiler optimizations, such as inlining, in the presence of eval. While Julia also provides programmers with the means to bypass world age, we found that this mechanism is not used extensively: a static analysis of over 4,000 registered Julia packages shows that only 4-9% of packages bypass world age. This suggests that Julia's semantics aligns with programmer expectations. © 2020 Owner/Author.

Designing types for R, empirically

Autoři

Turcotte, A.; Goel, A.; Křikava, F.; Vitek, J.

Rok

2020

Publikováno

Proceedings of the ACM on Programming Languages (PACMPL). 2020, 4(OOPSLA), 1-25. ISSN 2475-1421.

Typ

Článek

DOI

10.1145/3428249

Pracoviště

Laboratoř výzkumu programování

Anotace

The R programming language is widely used in a variety of domains. It was designed to favor an interactive style of programming with minimal syntactic and conceptual overhead. This design is well suited to data analysis, but a bad fit for tools such as compilers or program analyzers. In particular, R has no type annotations, and all operations are dynamically checked at run-time. The starting point for our work are the two questions: what expressive power is needed to accurately type R code? and which type system is the R community willing to adopt? Both questions are difficult to answer without actually experimenting with a type system. The goal of this paper is to provide data that can feed into that design process. To this end, we perform a large corpus analysis to gain insights in the degree of polymorphism exhibited by idiomatic R code and explore potential benefits that the R community could accrue from a simple type system. As a starting point, we infer type signatures for 25,215 functions from 412 packages among the most widely used open source R libraries. We then conduct an evaluation on 8,694 clients of these packages, as well as on end-user code from the Kaggle data science competition website.

On the Design, Implementation, and Use of Laziness in R

Autoři

Vitek, J.; Goel, A.

Rok

2019

Publikováno

Volume 3, Issue OOPSLA. New York: ACM, 2019. p. 1-27. ISSN 2475-1421.

Typ

Stať ve sborníku

DOI

10.1145/3360579

Pracoviště

Laboratoř výzkumu programování

Anotace

The R programming language has been lazy for over twenty-five years. This paper presents a review of the design and implementation of call-by-need in R, and a data-driven study of how generations of programmers have put laziness to use in their code. We analyze 16,707 packages and observe the creation of 270.9 B promises. Our data suggests that there is little supporting evidence to assert that programmers use laziness to avoid unnecessary computation or to operate over infinite data structures. For the most part R code appears to have been written without reliance on, and in many cases even knowledge of, delayed argument evaluation. The only significant exception is a small number of packages which leverage call-by-need for meta-programming.

Lambdulus: Teaching Lambda Calculus Practically

Autoři

Máj, P.; Sliacký, J.

Rok

2019

Publikováno

SPLASH-E 2019: Proceedings of the 2019 ACM SIGPLAN Symposium on SPLASH-E. New York: ACM, 2019. p. 57-65. ISBN 9781450369893.

Typ

Stať ve sborníku

DOI

10.1145/3358711.3361629

Pracoviště

Katedra teoretické informatiky
Fakulta informačních technologií
Laboratoř výzkumu programování

Anotace

λ calculus is a great formal introduction to functional programming. However, its abstract nature poses a challenge for many students as they struggle both with the unfamiliarity of functional programming and with the high abstraction and minimalism of λ calculus. As a result, functional programming classes are often delayed to older students who are hoped to be better prepared for appreciating its qualities, as was the case at the authors’ university as well. 3 years ago, as part of a redesigned curriculum around programming languages education, we have decided to introduce students to various programming paradigms much sooner than before. In doing so, we faced the problem of explaining very theoretical foundational concepts to very young students in a very short time. We have monitored the achievements and shortcomings of the new course over the past years and as a result of our findings have developed Lambdulus, an interactive and visual evaluator of λ calculus expressions that encourages students to explore the mechanisms of λ calculus by treating it not as a theoretical concept, but as a programming language in its own right.

On Julia's Efficient Algorithm for Subtyping Union Types and Covariant Tuples (Artifact).

Autoři

Vitek, J.; Nardelli, F.; Chung, B.

Rok

2019

Publikováno

33rd European Conference on Object-Oriented Programming (ECOOP 2019). Wadern: Schloss Dagstuhl - Leibniz Center for Informatics, 2019. p. 1-24. ISSN 1868-8969. ISBN 978-3-95977-111-5.

Typ

Stať ve sborníku

DOI

10.4230/LIPIcs.ECOOP.2019.24

Pracoviště

Laboratoř výzkumu programování

Anotace

The Julia programming language supports multiple dispatch and provides a rich type annotation language to specify method applicability. When multiple methods are applicable for a given call, Julia relies on subtyping between method signatures to pick the correct method to invoke. Julia’s subtyping algorithm is surprisingly complex, and determining whether it is correct remains an open question. In this paper, we focus on one piece of this problem: the interaction between union types and covariant tuples. Previous work normalized unions inside tuples to disjunctive normal form. However, this strategy has two drawbacks: complex type signatures induce space explosion, and interference between normalization and other features of Julia’s type system. In this paper, we describe the algorithm that Julia uses to compute subtyping between tuples and unions – an algorithm that is immune to space explosion and plays well with other features of the language. We prove this algorithm correct and complete against a semantic-subtyping denotational model in Coq

Pattern-Based S-Expression Rewriting in Emacs

Autoři

Culpepper, R.

Rok

2019

Publikováno

Proceedings of the 12th European Lisp Symposium (ELS 2019). New York: ACM, 2019. p. 8-10. ISSN 2677-3465. ISBN 978-2-9557474-3-8.

Typ

Stať ve sborníku

DOI

10.5281/zenodo.2642093

Pracoviště

Laboratoř výzkumu programování

Anotace

sexp-rewrite is an Emacs library for doing pattern-based rewriting of S-expression-structured code—ie, code in Lisp, Scheme, and Racket. The library provides a simple but powerful pattern language that enables users to define rewriting rules (called “tactics”) and auxiliary nonterminals.

Scala Implicits are Everywhere: A large-scale study of the use of Implicits in the wild

Autoři

Křikava, F.; Vitek, J.; Miller, H.

Rok

2019

Publikováno

Volume 3, Issue OOPSLA October 2019. New York: ACM, 2019. p. 1-28. ISSN 2475-1421.

Typ

Stať ve sborníku

DOI

10.1145/3360589

Pracoviště

Laboratoř výzkumu programování

Anotace

The Scala programming language offers two distinctive language features implicit parameters and implicit conversions, often referred together as implicits. Announced without fanfare in 2004, implicits have quickly grown to become a widely and pervasively used feature of the language. They provide a way to reduce the boilerplate code in Scala programs. They are also used to implement certain language features without having to modify the compiler. We report on a large-scale study of the use of implicits in the wild. For this, we analyzed 7,280 Scala projects hosted on GitHub, spanning over 8.1M call sites involving implicits and 370.7K implicit declarations across 18.7M lines of Scala code.

Syntax Templates in Racket

Autoři

Culpepper, R.

Rok

2019

Publikováno

Scheme and Functional Programming Workshop (Scheme '19). Birmingham: University of Alabama at Birmingham, 2019.

Typ

Stať ve sborníku

Pracoviště

Laboratoř výzkumu programování

Anotace

One foundation of Scheme macro systems is the notation of syntax patterns and templates. Patterns are used to match and destructure terms, and templates are used to construct new terms. This paper presents extensions to Racket’s template notation to support template splicing, conditional generation, and template metafunctions. The new template features complement syntax-parse’s previous extensions to the pattern-matching notation.

Online Abstraction with MDP Homomorphisms for Deep Learning

Autoři

Bíža, O.; Platt, R.

Rok

2019

Publikováno

Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. New York: ACM, 2019. p. 1125-1133. ISSN 2523-5699. ISBN 978-1-4503-6309-9.

Typ

Stať ve sborníku

Pracoviště

Katedra aplikované matematiky

Anotace

Abstraction of Markov Decision Processes is a useful tool for solving complex problems, as it can ignore unimportant aspects of an environment, simplifying the process of learning an optimal policy. In this paper, we propose a new algorithm for finding abstractMDPs in environments with continuous state spaces. It is based on MDP homomorphisms, a structure-preserving mapping betweenMDPs. We demonstrate our algorithm’s ability to learn abstractions from collected experience and show how to reuse the abstractions to guide exploration in new tasks the agent encounters. Our novel task transfer method outperforms baselines based on a deep Q-network in the majority of our experiments. The source code is at https://github.com/ondrejba/aamas_19.

From Macros to DSLs: The Evolution of Racket

Autoři

Culpepper, R.; Felleisen, M.; Flatt, M.; Krishnamurthi, S.

Rok

2019

Publikováno

3rd Summit on Advances in Programming Languages (SNAPL 2019). Saarbrücken: Dagstuhl Publishing,, 2019. p. 1-19. ISSN 1868-8969. ISBN 978-3-95977-113-9.

Typ

Stať ve sborníku

DOI

10.4230/LIPIcs.SNAPL.2019.5

Pracoviště

Laboratoř výzkumu programování

Anotace

The Racket language promotes a language-oriented style of programming. Developers create many domain-specific languages, write programs in them, and compose these programs via Racket code. This style of programming can work only if creating and composing little languages is simple and effective. While Racket’s Lisp heritage might suggest that macros suffice, its design team discovered significant shortcomings and had to improve them in many ways. This paper presents the evolution of Racket’s macro system, including a false start, and assesses its current state.

R melts brains: an IR for first-class environments and lazy effectful arguments

Autoři

Flückiger, O.; Chari, G.; Ječmen, J.; Yee, M.; Hain, J.; Vitek, J.

Rok

2019

Publikováno

Proceedings of the 15th ACM SIGPLAN International Symposium on Dynamic Languages. ACM New York, 2019. p. 55-66. ISBN 978-1-4503-6996-1.

Typ

Stať ve sborníku

DOI

10.1145/3359619.3359744

Pracoviště

Katedra teoretické informatiky
Laboratoř výzkumu programování

Anotace

The R programming language combines a number of features considered hard to analyze and implement efficiently: dynamic typing, reflection, lazy evaluation, vectorized primitive types, first-class closures, and extensive use of native code. Additionally, variable scopes are reified at runtime as first-class environments. The combination of these features renders most static program analysis techniques impractical, and thus, compiler optimizations based on them ineffective. We present our work on PIR, an intermediate representation with explicit support for first-class environments and effectful lazy evaluation. We describe two dataflow analyses on PIR: the first enables reasoning about variables and their environments, and the second infers where arguments are evaluated. Leveraging their results, we show how to elide environment creation and inline functions.

On the Impact of Programming Languages on Code Quality: A Reproduction Study

Autoři

Vitek, J.; Máj, P.; Hollenbeck, C.; Berger, E.; Vitek, O.

Rok

2019

Publikováno

ACM Transactions on Programming Languages and Systems,Vol. 41, No. 4. 2019, 41(4), 1-24. ISSN 0164-0925.

Typ

Článek

DOI

10.1145/3340571

Pracoviště

Katedra teoretické informatiky
Laboratoř výzkumu programování

Anotace

In a 2014 article, Ray, Posnett, Devanbu, and Filkov claimed to have uncovered a statistically significant association between 11 programming languages and software defects in 729 projects hosted on GitHub. Specifically, their work answered four research questions relating to software defects and programming languages. With data and code provided by the authors, the present article first attempts to conduct an experimental repetition of the original study. The repetition is only partially successful, due to missing code and issues with the classification of languages. The second part of this work focuses on their main claim, the association between bugs and languages, and performs a complete, independent reanalysis of the data and of the statistical modeling steps undertaken by Ray et al. in 2014. This reanalysis uncovers a number of serious flaws that reduce the number of languages with an association with defects down from 11 to only 4. Moreover, the practical effect size is exceedingly small. These results thus undermine the conclusions of the original study. Correcting the record is important, as many subsequent works have cited the 2014 article and have asserted, without evidence, a causal link between the choice of programming language for a given task and the number of software defects. Causation is not supported by the data at hand; and, in our opinion, even after fixing the methodological flaws we uncovered, too many unaccounted sources of bias remain to hope for a meaningful comparison of bug rates across languages.

Can android run on time? Extending and measuring the android platform's timeliness

Autoři

Yan, Y.; Gokul, G.; Dantu, K.; Ko, S.Y.; Vitek, J.; ZIAREK, L.

Rok

2019

Publikováno

ACM Transactions on Embedded Computing Systems. 2019, 17(6), ISSN 1539-9087.

Typ

Článek

DOI

10.1145/3289257

Pracoviště

Laboratoř výzkumu programování

Anotace

Time predictability is dificult to achieve in the complex, layered execution environments that are common inmodern embedded devices such as smartphones. We explore adopting the Android programming model for arange of embedded applications that extends beyond mobile devices, under the constraint that changes towidely used libraries should be minimized. The challenges we explore include: the interplay between real-timeactivities and the rest of the system, how to express the timeliness requirements of components, and how wellthose requirements can be met on stock embedded platforms. We detail the design and implementation of ourmodifications to the Android framework along with a real-time VM and OS, and provide experimental datavalidating feasibility over five applications.

How to Evaluate the Performance of Gradual Type Systems

Autoři

Vitek, J.; Greeman, B.; Takikawa, A.; New, M.S.; Feltey, D.; Findler, R.B.; Felleisen, M.

Rok

2019

Publikováno

JOURNAL OF FUNCTIONAL PROGRAMMING. 2019, 29 ISSN 0956-7968.

Typ

Článek

DOI

10.1017/S0956796818000217

Pracoviště

Laboratoř výzkumu programování

Anotace

A sound gradual type system ensures that untyped components of a program can never break the guarantees of statically typed components. This assurance relies on runtime checks, which in turn impose performance overhead in proportion to the frequency and nature of interaction between typed and untyped components. The literature on gradual typing lacks rigorous descriptions of methods for measuring the performance of gradual type systems. This gap has consequences for the implementors of gradual type systems and developers who use such systems. Without systematic evaluation of mixed-typed programs, implementors cannot precisely determine how improvements to a gradual type system affect performance. Developers cannot predict whether adding types to part of a program will significantly degrade (or improve) its performance. This paper presents the first method for evaluating the performance of sound gradual type systems. The method quantifies both the absolute performance of a gradual type system and the relative performance of two implementations of the same gradual type system. To validate the method, the paper reports on its application to twenty programs and three implementations of Typed Racket.

Feature-specific profiling

Autoři

Vitek, J.; Andersen, L.; St-Amour, V.; Felleisen, M.

Rok

2019

Publikováno

ACM Transactions on Programming Languages and Systems. 2019, 41(1), ISSN 0164-0925.

Typ

Článek

DOI

10.1145/3275519

Pracoviště

Laboratoř výzkumu programování

Anotace

While high-level languages come with significant readability and maintainability benefits, their performance remains difficult to predict. For example, programmers may unknowingly use language features inappropriately, which cause their programs to run slower than expected. To address this issue, we introduce feature-specific profiling, a technique that reports performance costs in terms of linguistic constructs. Feature-specific profilers help programmers find expensive uses of specific features of their language. We describe the architecture of a profiler that implements our approach, explain prototypes of the profiler for two languages with different characteristics and implementation strategies, and provide empirical evidence for the approach's general usefulness as a performance debugging tool. © 2018 Copyright held by the owner/author(s).

Julia: Dynamism and Performance Reconciled by Design

Autoři

Vitek, J.; Bezanson, J.; Chung, B.; Karpinski, S.; Shah, V.B.; Zoubritski, L.

Rok

2018

Publikováno

Journal Proceedings of the ACM on Programming Languages,Volume 2 Issue, OOPSLA. New York: ACM, 2018. ISSN 2475-1421.

Typ

Stať ve sborníku

DOI

10.1145/3276490

Pracoviště

Laboratoř výzkumu programování

Anotace

Julia is a programming language for the scientific community that combines features of productivity languages, such as Python or MATLAB, with characteristics of performance-oriented languages, such as C++ or Fortran. Julia's productivity features include: dynamic typing, automatic memory management, rich type annotations, and multiple dispatch. At the same time, Julia allows programmers to control memory layout and leverages a specializing just-in-time compiler to eliminate much of the overhead of those features. This paper details the design choices made by the creators of Julia and reflects on the implications of those choices for performance and usability.

Julia Subtyping: A Rational Reconstruction

Autoři

Vitek, J.; Chung, B.; Belyakova, Y.; Pelenitsyn, A.; Bezanson, J.; Nardelli, F.Z.

Rok

2018

Publikováno

Journal Proceedings of the ACM on Programming Languages,Volume 2 Issue, OOPSLA. New York: ACM, 2018. ISSN 2475-1421.

Typ

Stať ve sborníku

DOI

10.1145/3276483

Pracoviště

Laboratoř výzkumu programování

Anotace

Programming languages that support multiple dispatch rely on an expressive notion of subtyping to specify method applicability. In these languages, type annotations on method declarations are used to select, out of a potentially large set of methods, the one that is most appropriate for a particular tuple of arguments. Julia is a language for scientific computing built around multiple dispatch and an expressive subtyping relation. This paper provides the first formal definition of Julia's subtype relation and motivates its design. We validate our specification empirically with an implementation of our definition that we compare against the existing Julia implementation on a collection of real-world programs. Our subtype implementation differs on 122 subtype tests out of 6014476. The first 120 differences are due to a bug in Julia that was fixed once reported; the remaining 2 are under discussion.

Self-contained development environments

Autoři

Chari, G.; Vitek, J.; Pimás, J.; Flückiger, O.

Rok

2018

Publikováno

DLS 2018 Proceedings of the 14th ACM SIGPLAN International Symposium on Dynamic Languages. New York: ACM, 2018. p. 76-87. ISBN 9781450360302.

Typ

Stať ve sborníku

DOI

10.1145/3276945.3276948

Pracoviště

Laboratoř výzkumu programování

Anotace

Operating systems are traditionally implemented in lowlevel, performance-oriented programming languages. These languages typically rely on minimal runtime support and provide unfettered access to the underlying hardware. Tradition has benefits: developers control the resources that the operating system manages and few performance bottlenecks cannot be overcome with clever feats of programming. On the other hand, this makes operating systems harder to understand and maintain. Furthermore, those languages have few built-in barriers against bugs. This paper is an experiment in side-stepping operating systems, and pushing functionality into the runtime of high-level programming languages. The question we try to answer is how much support is needed to run an application written in, say, Smalltalk or Python on bare metal, that is, with no underlying operating system. We present a framework named NopSys that allows this, and we validate it with the implementation of CogNos a Smalltalk virtual machine running on bare x86 hardware. Experimental results suggest that this approach is promising.

Contextual Equivalence fo probabilistic languages

Autoři

Culpepper, R.; Wand, M.; Cobb, A.; Giannakopoulos, T.

Rok

2018

Publikováno

Journal Proceedings of the ACM on Programming Languages,Volume 2, Issue ICFP. New York: ACM, 2018. ISSN 2475-1421.

Typ

Stať ve sborníku

DOI

10.1145/3236782

Pracoviště

Laboratoř výzkumu programování

Anotace

We present a complete reasoning principle for contextual equivalence in an untyped probabilistic language. The language includes continuous (real-valued) random variables, conditionals, and scoring. It also includes recursion, since the standard call-by-value fixpoint combinator is expressible. We demonstrate the usability of our characterization by proving several equivalence schemas, including familiar facts from lambda calculus as well as results specific to probabilistic programming. In particular, we use it to prove that reordering the random draws in a probabilistic program preserves contextual equivalence. This allows us to show, for example, that (let x = e1 in let y = e2 in e0) =ctx (let y = e2 in let x = e1 in e0) (provided x does not occur free in e2 and y does not occur free in e1) despite the fact that e1 and e2 may have sampling and scoring effects.

Monotonic gradual typing in a common calculus

Autoři

Chung, B.; Vitek, J.

Rok

2018

Publikováno

Proceeding ISSTA '18 Companion Proceedings for the ISSTA/ECOOP 2018 Workshops. New York: ACM, 2018. p. 17-23. ISBN 978-1-4503-5939-9.

Typ

Stať ve sborníku

DOI

10.1145/3236454.3236475

Pracoviště

Laboratoř výzkumu programování

Anotace

Gradual typing refers to the notion that programs can be incrementally decorated with type annotations. Languages that support this approach to software development allow for programs being in various states of "typedness" on a scale ranging from entirely untyped to fully statically typed. Points in the middle of this typed-untyped scale create interactions between typed and untyped code, which is where gradual type systems differ. Each gradual type system comes with tradeoffs. Some systems provide strong guarantees at the expense of vastly degraded performance; others do not impact the running time of programs, but they do little to prevent type errors. This paper looks at an intriguing point in the landscape of these systems: the monotonic semantics. The monotonic semantics is designed to reduce the overhead of typed field access through a different enforcement mechanism compared to other gradual type systems. In our previous paper, [1], we described four semantics for gradual typing. This paper uses the framework of that companion paper to present and explore a formulation for the monotonic semantics. In comparison to the others, the monotonic semantics is designed to reduce the overhead of typed field access. We translate a common gradually typed source language to a common statically typed target language according to the monotonic semantics, allowing easy comparison to other gradual type systems in our framework.

Tests from traces: Automated unit test extraction for R

Autoři

Křikava, F.; Vitek, J.

Rok

2018

Publikováno

ISSTA 2018 - Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM PRESS, 2018. p. 232-241. ISBN 978-1-4503-5699-2.

Typ

Stať ve sborníku

DOI

10.1145/3213846.3213863

Pracoviště

Laboratoř výzkumu programování

Anotace

Unit tests are labor-intensive to write and maintain. This paper looks into how well unit tests for a target software package can be extracted from the execution traces of client code. Our objective is to reduce the effort involved in creating test suites while minimizing the number and size of individual tests, and maximizing coverage. To evaluate the viability of our approach, we select a challenging target for automated test extraction, namely R, a programming language that is popular for data science applications. The challenges presented by R are its extreme dynamism, coerciveness, and lack of types. This combination decrease the efficacy of traditional test extraction techniques. We present Genthat, a tool developed over the last couple of years to non-invasively record execution traces of R programs and extract unit tests from those traces. We have carried out an evaluation on 1,545 packages comprising 1.7M lines of code. The tests extracted by Genthat improved code coverage from the original rather low value of 267,496 lines to 700,918 lines. The running time of the generated tests is 1.9 times faster than the code they came from. © 2018 Association for Computing Machinery.

KafKa: Gradual typing for objects

Autoři

Chung, B.; Li, P.; Nardelli, F.Z.; Vitek, J.

Rok

2018

Publikováno

Leibniz International Proceedings in Informatics, LIPIcs. Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik, 2018. ISBN 978-3-95977-079-8.

Typ

Stať ve sborníku

DOI

10.4230/LIPIcs.ECOOP.2018.12

Pracoviště

Laboratoř výzkumu programování

Anotace

A wide range of gradual type systems have been proposed, providing many languages with the ability to mix typed and untyped code. However, hiding under language details, these gradual type systems embody fundamentally different ideas of what it means to be well-typed. In this paper, we show that four of the most common gradual type systems provide distinct guarantees, and we give a formal framework for comparing gradual type systems for object-oriented languages. First, we show that the different gradual type systems are practically distinguishable via a three-part litmus test. We present a formal framework for defining and comparing gradual type systems. Within this framework, different gradual type systems become translations between a common source and target language, allowing for direct comparison of semantics and guarantees. © Benjamin Chung, Paley Li, Francesco Zappa Nardelli, and Jan Vitek.

Correctness of a concurrent object collector for actor languages

Autoři

Franco, J.; Clebsch, S.; Drossopoulou, S.; Vitek, J.

Rok

2018

Publikováno

Lecture Notes in Computer Science book series. Düsseldorf: Springer-VDI-Verlag, 2018. p. 885-911. ISBN 978-3-319-89883-4.

Typ

Stať ve sborníku

DOI

10.1007/978-3-319-89884-1_31

Pracoviště

Laboratoř výzkumu programování

Anotace

ORCA is a garbage collection protocol for actor-based programs. Multiple actors may mutate the heap while the collector is running without any dedicated synchronisation. ORCA is applicable to any actor language whose type system prevents data races and which supports causal message delivery. We present a model of ORCA which is parametric to the host language and its type system. We describe the interplay between the host language and the collector. We give invariants preserved by ORCA, and prove its soundness and completeness. © The Author(s) 2018.

Correctness of speculative optimizations with dynamic deoptimization

Autoři

Vitek, J.; Fluckiger, O.; Scherer, G.; Yee, M.; Goel, A.; Ahmed, A.

Rok

2018

Publikováno

Proceedings of the ACM on Programming Languages (PACMPL). 2018, 2 ISSN 2475-1421.

Typ

Článek

DOI

10.1145/3158137

Pracoviště

Laboratoř výzkumu programování

Anotace

High-performance dynamic language implementations make heavy use of speculative optimizations to achieve speeds close to statically compiled languages. These optimizations are typically performed by a just-in-time compiler that generates code under a set of assumptions about the state of the program and its environment. In certain cases, a program may execute code compiled under assumptions that are no longer valid. The implementation must then deoptimize the program on-the-fly; this entails finding semantically equivalent code that does not rely on invalid assumptions, translating program state to that expected by the target code, and transferring control. This paper looks at the interaction between optimization and deoptimization, and shows that reasoning about speculation is surprisingly easy when assumptions are made explicit in the program representation. This insight is demonstrated on a compiler intermediate representation, named sourir, modeled after the high-level representation for a dynamic language. Traditional compiler optimizations such as constant folding, unreachable code elimination, and function inlining are shown to be correct in the presence of assumptions. Furthermore, the paper establishes the correctness of compiler transformations specific to deoptimization: namely unrestricted deoptimization, predicate hoisting, and assume composition.

Parallelizing Julia with a Non-invasive DSL

Autoři

Anderson, T A; Liu, H; Kuper, L; Totoni, E; Vitek, J.; Shpeisman, T

Rok

2017

Publikováno

31st European Conference on Object-Oriented Programming (ECOOP 2017). Wadern: Schloss Dagstuhl - Leibniz Center for Informatics, 2017. p. 41-429. ISSN 1868-8969. ISBN 9783959770354.

Typ

Stať ve sborníku

DOI

10.4230/LIPIcs.ECOOP.2017.4

Pracoviště

Laboratoř výzkumu programování

Anotace

Computational scientists often prototype software using productivity languages that offer high-level programming abstractions. When higher performance is needed, they are obliged to rewrite their code in a lower-level efficiency language. Different solutions have been proposed to address this trade-off between productivity and efficiency. One promising approach is to create embedded domain-specific languages that sacrifice generality for productivity and performance, but practical experience with DSLs points to some road blocks preventing widespread adoption. This paper proposes a non-invasive domain-specific language that makes as few visible changes to the host programming model as possible. We present ParallelAccelerator, a library and compiler for high-level, high-performance scientific computing in Julia. ParallelAccelerator's programming model is aligned with existing Julia programming idioms. Our compiler exposes the implicit parallelism in high-level array-style programs and compiles them to fast, parallel native code. Programs can also run in "library-only" mode, letting users benefit from the full Julia environment and libraries. Our results show encouraging performance improvements with very few changes to source code required. In particular, few to no additional type annotations are necessary.

Hadoop-Benchmark: Rapid Prototyping and Evaluation of Self-Adaptive Behaviors in Hadoop Clusters

Autoři

Zhang, B.; Křikava, F.; Rouvoy, R.; Senturier, L.

Rok

2017

Publikováno

2017 IEEE/ACM 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS). USA: IEEE Computer Society, 2017. p. 175-181. ISBN 978-1-5386-1550-8.

Typ

Stať ve sborníku

DOI

10.1109/SEAMS.2017.15

Pracoviště

Laboratoř výzkumu programování

Anotace

Optimizing Hadoop executions has attracted a lot of research contributions in particular in the domain of self- adaptive software systems. However, these research efforts are often hindered by the complexity of Hadoop operation and the difficulty to reproduce experimental evaluations that makes it hard to compare different approaches to one another. To address this limitation, we propose a research acceleration platform for rapid prototyping and evaluation of self-adaptive behavior in Hadoop clusters. Essentially, it provides automated approach to provision reproducible Hadoop environments and execute acknowledged benchmarks. It is based on the state- of-the-art container technology that supports both distributed configurations as well as standalone single-host setups. We demonstrate the approach on a complete implementation of a concrete Hadoop self-adaptive case study.

Verifying a Concurrent Garbage Collector using a Rely-Guarantee Methodology

Autoři

Zakowski, Y; Cachera, D; Demange, D; Petri, G; Pichardie, D; Jagannathan, S; Vitek, J.

Rok

2017

Publikováno

International Conference on Interactive Theorem Prooving (ITP). Berlin: Springer-Verlag, 2017.

Typ

Stať ve sborníku

Pracoviště

Laboratoř výzkumu programování

Making Android Run on Time

Autoři

Yan, Y; Dantu, K; Ko, S Y; Ziarek, L; Vitek, J.

Rok

2017

Publikováno

Real-Time and Embedded technology and Applications Symposium (RTAS 2017). San Francisco: American Institute of Physics and Magnetic Society of the IEEE, 2017.

Typ

Stať ve sborníku

Pracoviště

Laboratoř výzkumu programování