doc. Ing. Jan Janoušek, Ph.D.

Vedoucí Katedry teoretické informatiky

Publikace

Forward linearised tree pattern matching using tree pattern border array

Autoři
Rok
2024
Publikováno
Discrete Applied Mathematics. 2024, 352 33-43. ISSN 1872-6771.
Typ
Článek
Anotace
We define a tree pattern border array as a property of linearised trees analogous to border arrays from the string domain. We consider tree patterns with node wildcards and subtree wildcards. Then we use the tree pattern border array in a new forward tree pattern matching algorithm for ordered ranked trees. The algorithm finds all occurrences of a single linearised tree pattern in a linearised input tree. As with the classical Morris–Pratt algorithm for searching in strings, the tree pattern border array is used to determine shift lengths in the searching phase of the tree pattern matching algorithm. We compare this new algorithm with the best performing previously existing algorithms based on backward linearised tree pattern matching algorithms, (non-)linearised tree pattern matching algorithms using finite tree automata or stringpath matchers. We show that our new algorithm outperforms other tree pattern matching algorithms in single tree pattern matching.

Inexact tree pattern matching with 1-degree edit distance using finite automata

Autoři
Rok
2023
Publikováno
Discrete Applied Mathematics. 2023, 330 78-97. ISSN 0166-218X.
Typ
Článek
Anotace
Given an input tree and a tree pattern, the inexact tree pattern matching problem is about finding all subtrees in the input tree that match the tree pattern with up to errors. To measure the number of errors between two labeled ordered trees, we use the 1-degree edit distance that uses operations node relabel, leaf insert, and leaf delete. The proposed solution is based on a finite automaton that reads a given input tree represented in linear, prefix bar, notation. First, we show its construction for a constrained variant of 1-degree edit distance where leaf insert/delete operations cannot be applied to the tree pattern recursively. Then, we extend this approach to both unit cost and non-unit cost 1-degree edit distance. The deterministic version of the proposed finite automaton finds all inexact occurrences of the tree pattern in time linear to the size of the input tree. However, since the size of such automaton can be exponential in the number of nodes of the tree pattern, it is practical only for small patterns. Therefore, we also consider the dynamic programming approach as a way of simulating the nondeterministic finite automaton. This approach comes with the space complexity and time complexity where is the number of nodes of the tree pattern, is the number of nodes of the input tree, and is the maximum number of errors allowed.

On the Smallest Synchronizing Terms of Finite Tree Automata

Autoři
Blažej, V.; Janoušek, J.; Plachý, Š.
Rok
2023
Publikováno
Implementation and Application of Automata. Springer, Cham, 2023. p. 79-90. ISSN 1611-3349. ISBN 978-3-031-40247-0.
Typ
Stať ve sborníku
Anotace
This paper deals with properties of synchronizing terms for finite tree automata, which is a generalization of the synchronization principle of deterministic finite string automata (DFA) and such terms correspond to a connected subgraph, where a state in the root is always the same regardless of states of subtrees attached to it. We ask, what is the maximum height of the smallest synchronizing term of a deterministic bottom-up tree automaton (DFTA) with n states, which naturally leads to two types of synchronizing terms, called weak and strong, that depend on whether a variable, i.e., a placeholder for a subtree, must be present in at least one leaf or all of them. We prove that the maximum height in the case of weak synchronization has a theoretical upper bound sl(𝑛)+𝑛−1, where sl(𝑛) is the maximum length of the shortest synchronizing string of an n-state DFAs. For strong synchronization, we prove exponential bounds. We provide a theoretical upper bound of 2^𝑛−𝑛−1 for the height and two constructions of automata approaching it. One achieves the height of Θ(2^(𝑛−√𝑛)) with an alphabet of linear size, and the other achieves 2^(𝑛−1)−1 with an alphabet of quadratic size.

Automata Approach to Inexact Tree Pattern Matching Using 1-degree Edit Distance

Autoři
Rok
2021
Publikováno
Proceedings of the Prague Stringology Conference 2021. Praha: CESKE VYSOKE UCENI TECHNICKE V PRAZE, 2021. p. 1-15. ISBN 978-80-01-06869-4.
Typ
Stať ve sborníku
Anotace
We compare labeled ordered trees based on unit cost 1-degree edit distance that uses operations vertex relabeling, leaf insertion, and leaf deletion. Given an input tree T and a tree pattern P, we find all subtrees in T that match P with up to k errors. We show that this problem can be solved by finite automaton when T and P are represented in linear, prefix bar, notation. First, we solve this problem by a pushdown automaton. Then, we show that it can be transformed into a nondeterministic finite automaton due to its restricted use of the pushdown store. We also show a simulation of the nondeterministic finite automaton by dynamic programming.

Conversion of Finite Tree Automata to Regular Tree Expressions By State Elimination

Rok
2020
Publikováno
Proceedings of the Prague Stringology Conference 2020. Praha: Czech Technical University in Prague, 2020. p. 11-22. ISBN 978-80-01-06749-9.
Typ
Stať ve sborníku
Anotace
Regular tree languages can be accepted and described by finite tree automata and regular tree expressions, respectively. We describe a new algorithm that converts a finite tree automaton to an equivalent regular tree expression. Our algorithm is analogous to the well-known state elimination method of the conversion of a string finite automaton to an equivalent string regular expression. We define a generalised finite tree automaton, the transitions of which read the sets of trees described by regular tree expressions. Our algorithm eliminates states of the generalised finite tree automaton, which is analogous to the elimination of states in converting the string finite automaton.

Forward Linearised Tree Pattern Matching Using Tree Pattern Border Array

Autoři
Rok
2020
Publikováno
Proceedings of the Prague Stringology Conference 2020. Praha: Czech Technical University in Prague, 2020. p. 61-73. ISBN 978-80-01-06749-9.
Typ
Stať ve sborníku
Anotace
We define a tree pattern border array as a property of linearised trees analogous to border arrays from the string domain. We use it to define a new forward tree pattern matching algorithm for ordered trees, which finds all occurrences of a single given linearised tree pattern in a linearised input tree. As with the classical Morris-Pratt algorithm, the tree pattern border array is used to determine shift lengths in the matching phase of the tree pattern matching algorithm. We compare the new algorithm with the best performing previously existing algorithms based on backward linearised tree pattern matching algorithms, (non-)linearised tree pattern matching algorithms using finite tree automata or stringpath matchers. We show that the presented algorithm outperforms these for single tree pattern matching.

On modification of Boyer-Moore-horspool's algorithm for tree pattern matching in linearised trees

Autoři
Trávníček, J.; Janoušek, J.; Melichar, B.; Cleophas, L.
Rok
2020
Publikováno
Theoretical Computer Science. 2020, 830 60-90. ISSN 0304-3975.
Typ
Článek
Anotace
Tree pattern matching on ordered trees is an important problem in Computer Science. Ordered trees can be represented as strings with additional properties via various linearisations. We present a backward tree pattern matching algorithm for ordered trees for various linear representations of trees and tree patterns. The algorithm adaptations find all occurrences of a single given tree pattern which match an input tree regardless of the chosen linearisation. The algorithms preserve the properties and advantages of standard backward string pattern matching using Boyer-Moore-Horspool's bad character shift heuristics. The number of symbol comparisons in the backward tree pattern matching can be sublinear in the size of the input tree. As in the case of the string version of Boyer-Moore-Horspool's matching algorithm, the size of the bad character shift table used by the algorithm is linear in the size of the alphabet. We compare the algorithm adaptations with the algorithm using originally chosen linear representation and with the best performing previously existing algorithms based on (non-linearised) tree pattern matching using finite tree automata or stringpath matchers. We show that the presented backward tree pattern matching algorithms outperform the non-linearising ones for single pattern matching and they perform among themselves comparably. (C) 2020 Elsevier B.V. All rights reserved.

On Synchronizing Tree Automata and Their Work–Optimal Parallel Run, Usable for Parallel Tree Pattern Matching

Rok
2020
Publikováno
SOFSEM 2020: Theory and Practice of Computer Science. Cham: Springer, 2020. p. 576-586. Lecture Notes in Computer Science. vol. 12011. ISSN 0302-9743. ISBN 978-3-030-38918-5.
Typ
Stať ve sborníku
Anotace
We present a way of synchronizing finite tree automata: We define a synchronizing term and a k-local deterministic finite bottom–up tree automaton. Furthermore, we present a work–optimal parallel algorithm for parallel run of the deterministic k-local tree automaton in O(log n) time with ⌈n/ log n⌉ processors, for k ≤ log n, or in O(k) time with ⌈n/ k⌉ processors, for k ≥ log n, where n is the number of nodes of an input tree, on EREW PRAM. Finally, we prove that the deterministic finite bottom–up tree automaton that is used as a standard tree pattern matcher is k-local with respect to the height of a tree pattern.

Automata Approach to XML Data Indexing

Rok
2018
Publikováno
Information. 2018, 9(1), ISSN 2078-2489.
Typ
Článek
Anotace
The internal structure of XML documents can be viewed as a tree. Trees are among the fundamental and well-studied data structures in computer science. They express a hierarchical structure and are widely used in many applications. This paper focuses on the problem of processing tree data structures; particularly, it studies the XML index problem. Although there exist many state-of-the-art methods, the XML index problem still belongs to the active research areas. However, existing methods usually lack clear references to a systematic approach to the standard theory of formal languages and automata. Therefore, we present some new methods solving the XML index problem using the automata theory. These methods are simple and allow one to efficiently process a small subset of XPath. Thus, having an XML data structure, our methods can be used efficiently as auxiliary data structures that enable answering a particular set of queries, e.g., XPath queries using any combination of the child and descendant-or-self axes. Given an XML tree model with n nodes, the searching phase uses the index, reads an input query of size m, finds the answer in time O(m) and does not depend on the size of the original XML document.

Constrained Approximate Subtree Matching by Finite Automata

Autoři
Rok
2018
Publikováno
Proceedings of the Prague Stringology Conference 2018. Praha: Czech Technical University in Prague, 2018. p. 79-90. ISBN 978-80-01-06484-9.
Typ
Stať ve sborníku
Anotace
Processing tree data structures usually requires a pushdown automaton as a model of computation. Therefore, it is interesting that a finite automaton can be used to solve the constrained approximate subtree pattern matching problem. A systematic approach to the construction of such matcher by finite automaton, which reads input trees in prefix bar notation, is presented. Given a tree pattern and an input tree with m and n nodes, respectively, the nondeterministic finite automaton for the pattern is constructed and it is able to find all occurrences of the pattern to subtrees of the input tree with maximum given distance k. The distance between the pattern and subtrees of an input tree is measured by minimal number of restricted tree edit operations, called leaf nodes edit operations. The corresponding deterministic finite automaton finds all occurrences in time O(n) and has O(|A|^k m^(k+1)) states, where A is an alphabet containing all possible node labels. Note that the size is not exponential in the number of nodes of the tree pattern but only in the number of errors. In practice, the number of errors is expected to be a small constant that is much smaller than the size of the pattern. To achieve better space complexity, it is also shown how dynamic programming approach can be used to simulate the nondeterministic automaton. The space complexity of this approach is O(m), while the time complexity is O(mn).

Construction of a Pushdown Automaton Accepting a Postfix Notation of a Tree Language Given by a Regular Tree Expression

Autoři
Rok
2018
Publikováno
7th Symposium on Languages, Applications and Technologies (SLATE 2018). Saarbrücken: Dagstuhl Publishing,, 2018. p. 6:1-6:12. ISSN 2190-6807. ISBN 978-3-95977-072-9.
Typ
Stať ve sborníku
Anotace
Regular tree expressions are a formalism for describing regular tree languages, which can be accepted by a finite tree automaton as a standard model of computation. It was proved that the class of regular tree languages is a proper subclass of tree languages those linear notations can be accepted by deterministic string pushdown automata. In this paper, we present a new algorithm for transforming regular tree expressions to equivalent real-time height-deterministic pushdown automata that accept the trees in their postfix notation.

Indexing XML Documents Using Tree Paths Automaton

Rok
2017
Publikováno
6th Symposium on Languages, Applications and Technologies. Saarbrücken: Dagstuhl Publishing,, 2017. p. 10:1-10:14. ISSN 1868-8969. ISBN 978-3-95977-056-9.
Typ
Stať ve sborníku
Anotace
An XML document can be viewed as a tree in a natural way. Processing tree data structures usually requires a pushdown automaton as a model of computation. Therefore, it is interesting that a finite automaton can be used to solve the XML index problem. In this paper, we attempt to support a significant fragment of XPath queries which may use any combination of child (i.e., /) and descendant-or-self (i.e., //) axis. A systematic approach to the construction of such XML index, which is a finite automaton called Tree Paths Automaton, is presented. Given an XML tree model T, the tree is first of all preprocessed by means of its linear fragments called string paths. Since only path queries are considered, the branching structure of the XML tree model can be omitted. For individual string paths, smaller Tree Paths Automata are built, and they are afterwards combined to form the index. The searching phase uses the index, reads an input query Q of size m, and computes the list of positions of all occurrences of Q in the tree T. The searching is performed in time O(m) and does not depend on the size of the XML document. Although the number of queries is clearly exponential in the number of nodes of the XML tree model, the size of the index seems to be, according to our experimental results, usually only about 2.5 times larger than the size of the original document.

Efficient determinization of visibly and height-deterministic pushdown automata

Autoři
Polách, R.; Trávníček, J.; Janoušek, J.; Melichar, B.
Rok
2016
Publikováno
Computer Languages, Systems and Structures. 2016, 2016(46), 91-105. ISSN 1477-8424.
Typ
Článek
Anotace
New algorithms for the determinization of nondeterministic visibly and nondeterministic real-time height-deterministic pushdown automata are presented. The algorithms improve the results of existing algorithms. They construct only accessible states and necessary pushdown symbols of the resulting deterministic pushdown automata.

A new algorithm for the determinisation of visibly pushdown automata

Autoři
Polách, R.; Trávníček, J.; Janoušek, J.; Melichar, B.
Rok
2015
Publikováno
Proceedings of the 2015 Federated Conference on Computer Science and Information Systems. New York: IEEE, 2015. pp. 915-922. ISSN 2300-5963. ISBN 978-83-60810-65-1.
Typ
Stať ve sborníku
Anotace
Visibly pushdown automata are pushdown automata whose pushdown operations are determined by the input symbol, where the input alphabet is partitioned into three parts for push, pop and local pushdown operations. It is well known that nondeterministic visibly pushdown automata can be determinised. In this paper a new algorithm for the determinisation of nondeterministic visibly pushdown automata is presented. The algorithm improves the existing methods and can result in significantly smaller deterministic pushdown automata. This is achieved in a way that only necessary and accessible states and pushdown symbols are computed and constructed during the determinisation.

Backward Linearised Tree Pattern Matching

Autoři
Trávníček, J.; Janoušek, J.; Melichar, B.; Cleophas, L.
Rok
2015
Publikováno
Language and Automata Theory and Applications - 9th International Conference, {LATA} 2015, Nice, France, March 2-6, 2015, Proceedings. Berlin: Springer, 2015. pp. 599-610. LNCS. ISSN 0302-9743. ISBN 978-3-319-15578-4.
Typ
Stať ve sborníku
Anotace
We present a new backward tree pattern matching algorithm for ordered trees. The algorithm finds all occurrences of a single given tree pattern which match an input tree. It makes use of linearisations of both the given pattern and the input tree. The algorithm preserves the properties and advantages of standard backward string pattern matching approaches. The number of symbol comparisons in the backward tree pattern matching can be sublinear in the size of the input tree. As in the case of backward string pattern matching, the size of the bad character shift table used by the algorithm is linear in the size of the alphabet. We compare the new algorithm with best performing previously existing algorithms based on (non-linearised) tree pattern matching using finite tree automata or stringpath matchers and show that it outperforms these for single pattern matching.

On Distributed Concern Delivery in User Interface Design

Autoři
Černý, T.; Macík, M.; Donahoo, M.J.; Janoušek, J.
Rok
2015
Publikováno
COMSIS - Computer Science and Information Systems. 2015, 2015 655-681. ISSN 1820-0214.
Typ
Článek
Anotace
Increasing demands on user interface (UI) usability, adaptability, and dynamic behavior drives ever-growing development and maintenance complexity. Traditional UI design techniques result in complex descriptions for data presentations with significant information restatement. In addition, multiple concerns in UI development leads to descriptions that exhibit concern tangling, which results in high fragment replication. Concern-separating approaches address these issues; however, they fail to maintain the separation of concerns for execution tasks like rendering or UI delivery to clients. During the rendering process at the server side, the separation collapses into entangled concerns that are provided to clients. Such client-side entanglement may seem inconsequential since the clients are simply displaying what is sent to them; however, such entanglement compromises client performance as it results in problems such as replication, fragment granularity ill-suited for effective caching, etc. This paper considers advantages brought by concern-separation from both perspectives. It proposes extension to the aspect-oriented UI design with distributed concern delivery (DCD) for client-server applications. Such an extension lessens the serverside involvement in UI assembly and reduces the fragment replication in provided UI descriptions. The server provides clients with individual UI concerns, and they become partially responsible for the UI assembly. This change increases client-side concern reuse and extends caching opportunities, reducing the volume of transmitted information between client and server to improve UI responsiveness and performance. The underlying aspect-oriented UI design automates the server-side derivation of concerns related to data presentations adapted to runtime context, security, conditions, etc. Evaluation of the approach is considered in a case study applying DCD to an existing, production web application.

Tree String Path Subsequences Automaton and Its Use for Indexing XML Documents

Rok
2015
Publikováno
Languages, Applications and Technologies. Cham: Springer International Publishing AG, 2015. pp. 171-181. ISSN 1865-0929. ISBN 978-3-319-27652-6.
Typ
Stať ve sborníku
Anotace
The theory of indexing texts is well-researched, which does not hold for indexing other data structures, such as trees for example. In this paper a simple method of indexing a tree for subsequences of string paths in the tree by finite automaton is presented. The use of the index is shown on indexing XML documents for XPath descendant-or-self axis inspired queries. Given a subject tree T with n nodes, the tree is preprocessed and an index, which is a directed acyclic subsequence graph for a set of strings, is constructed. The searching phase uses the index, reads an input string path subsequence Q inspired by the specific XPath query of size m and computes the list of positions of all occurrences of Q in the tree T. The searching is performed in time O(m) and does not depend on n. Although the number of distinct valid queries is O(2n), the size of the index is O(hk), where h is the height of the tree T and k is the number of its leaves. Moreover, we discuss that in the case of indexing a common XML document the size of the index is even smaller O(h⋅2k).

A Full and Linear Index of a Tree for Tree Patterns

Autoři
Janoušek, J.; Melichar, B.; Polách, R.; Poliak, M.; Trávníček, J.
Rok
2014
Publikováno
DCFS 2014: Proceedings of the 16th International Workshop on Descriptional Complexity of Formal Systems. Berlin: Springer, 2014. pp. 198-209. Lecture Notes in Computer Science. ISSN 0302-9743. ISBN 978-3-319-09703-9.
Typ
Stať ve sborníku
Anotace
A new and simple method of indexing a tree for tree patterns is presented. A tree pattern is a tree whose leaves can be labelled by a special symbol S, which serves as a placeholder for any subtree. Given a subject tree T with n nodes, the tree is preprocessed and an index, which consists of a standard string compact suffix automaton and a subtree jump table, is constructed. The number of distinct tree patterns which match the tree is O(2n), and the size of the index is O(n). The searching phase uses the index, reads an input tree pattern P of size m and computes the list of positions of all occurrences of the pattern P in the tree T . For an input tree pattern P in linear prefix notation pref(P) = P1SP2S . . . SPk, k  1, the searching is performed in time O(m+k Pi=1 |occ(Pi)|)), where occ(Pi) is the set of all occurrences of Pi in pref(T ).

Efficient Description and Cache Performance in Aspect-Oriented User Interface Design

Autoři
Černý, T.; Macík, M.; Donahoo, M.; Janoušek, J.
Rok
2014
Publikováno
Proceedings of the 2014 Federated Conference on Computer Science and Information Systems. Katowice: Polish Information Processing Society, 2014. pp. 1667-1676. Annals of Computer Science and Information Systems. ISSN 2300-5963. ISBN 978-83-60810-58-3.
Typ
Stať ve sborníku
Anotace
Increasing demands on web user interface (UI) usability, adaptability, and dynamic behavior drives ever growing development and maintenance complexity. Conventional design approaches scale poorly with such rising complexity, resulting in rapidly increasing costs. Much of the complexity centers around data presentation and processing. Recent work greatly reduces such data complexity through the application of Aspect-Oriented UI (AOUI) design, which separates various UI concerns; however, rendering in conventional and even AOUI approaches fails to maintain this separation, often resulting in high reptitions of concern fragments due to tangling. Even worse, mixing of dynamic and immutable components greatly limits caching efficacy as each have differing lifetimes. We extend AOUI design to push down concern separation to rendering, which reduces description size, through repetition reduction, and enables separate caching of individual concerns. Our results show considerable size reduction of UI descriptions for data presentations, faster load times and extended caching capabilities.

Target code selection by tilling ast with the use of tree pattern pushdown automaton

Autoři
Janoušek, J.; Málek, Jaromír
Rok
2014
Publikováno
3rd Symposium on Languages, Applications and Technologies, SLATE 2014. Wadern: Schloss Dagstuhl - Leibniz Center for Informatics, 2014. pp. 159-165. ISSN 2190-6807. ISBN 978-3-939897-68-2.
Typ
Stať ve sborníku
Anotace
A new and simple method for target code selection by tilling an abstract syntax tree is presented. As it is usual, tree patterns corresponding to target machine instructions are matched in the abstract syntax tree. Matching tree patterns is performed with the use of tree pattern pushdown automaton, which accepts all tree patterns matching the abstract syntax tree in the linear postfix bar notation and represents a full index of the abstract syntax tree for tree patterns. The use of the index allows to match patterns quickly, in time depending on the size of patterns and not depending on the size of the tree. The selection of a particular target instruction corresponds to a modification of the abstract syntax tree and also a corresponding incremental modification of the index is performed. A reference to a fully functional prototype is provided.

Tree template matching in unranked ordered trees

Autoři
Christou, M.; Flouri, T.; Iliopoulos, C.S.; Janoušek, J.; Melichar, B.; Pissis, S.P.; Žďárek, J.
Rok
2013
Publikováno
Journal of Discrete Algorithms. 2013, 20 51-60. ISSN 1570-8667.
Typ
Článek
Anotace
We consider the problem of tree template matching, a type of tree pattern matching, where the tree templates have some of their leaves denoted as “donʼt care”, and propose a solution based on the bottom-up technique.

Arbology: Trees and pushdown automata

Autoři
Melichar, B.; Janoušek, J.; Flouri, T.
Rok
2012
Publikováno
Kybernetika. 2012, 48(3), 402-428. ISSN 0023-5954.
Typ
Článek
Anotace
We present a unified and systematic approach to basic principles of Arbology, a new algorithmic discipline focusing on algorithms on trees. Stringology, a highly developed algorithmic discipline in the area of string processing, can use finite automata as its basic model of computation. For various kinds of linear notations of ranked and unranked ordered trees it holds that subtrees of a tree in a linear notation are substrings of the tree in the linear notation. Arbology uses pushdown automata reading such linear notations of trees as its basic model of computation. Basic principles known from stringology are used for the construction of particular arbology algorithms, in which the underlying tree structure is processed with the use of the pushdown store. Arbology results are shown for the basic problems subtree matching and tree indexing for ranked and unranked ordered trees.

Computing all subtree repeats in ordered trees

Autoři
Christou, M.; Crochemore, M.; Flouri, Tomas; Iliopoulos, C S; Janoušek, J.; Melichar, B.; Pisis, S.
Rok
2012
Publikováno
Information Processing Letters. 2012, 112(24), 958-962. ISSN 0020-0190.
Typ
Článek
Anotace
We consider the problem of computing all subtree repeats in a given labeled ordered tree. We first transform the tree to a string representing its postfix notation, and then present an algorithm based on the bottom-up technique to solve it. The proposed algorithm consists of two phases: the preprocessing phase and the phase where all subtree repeats are computed. The linear time and space complexity of the proposed algorithm are important parts of its quality.

Indexing ordered trees for (nonlinear) tree pattern matching

Autoři
Rok
2012
Publikováno
COMSIS - Computer Science and Information Systems. 2012, 9(3), 1125-1153. ISSN 1820-0214.
Typ
Článek
Anotace
A new kind of acyclic pushdown automata for an ordered tree is presented. The tree pattern pushdown automaton and the nonlinear tree pattern pushdown automaton represent a complete index of the tree for tree patterns and nonlinear tree patterns, respectively. Given a tree with n nodes, the numbers of distinct tree patterns and nonlinear tree patterns whichmatch the tree can be atmost 2n-1+n and at most (2+v)n-1+2, respectively, where v is the maximal number of nonlinear variables allowed in nonlinear tree patterns. The total sizes of nondeterministic versions of the two pushdown automata are O(n) and O(n2), respectively. We discuss the time complexities and show timings of our implementations using the bit-parallelismtechnique. The timings show that for a given tree the running time is linear to the size of the input pattern. The presented pushdown automata are input-driven and therefore can be also determinised.

Tree compression pushdown automaton

Autoři
Janoušek, J.; Melichar, B.; Poliak, M.
Rok
2012
Publikováno
Kybernetika. 2012, 48(3), 429-452. ISSN 0023-5954.
Typ
Článek
Anotace
A new kind of a deterministic pushdown automaton, called a Tree Compression Automaton, is presented. The tree compression automaton represents a complete compressed index of a set of trees for subtrees and accepts all subtrees of given trees. The algorithm for constructing our pushdown automaton is incremental. For a single tree with n nodes, the automaton has at most n+1 states, its transition function cardinality is at most 4n and there are 2n+1 pushdown store symbols. If hashing is used for storing automaton's transitions, thus removing a factor of log n, the construction of the automaton takes linear time and space with respect to the length n of the input tree(s). Our pushdown automaton construction can also be used for finding all subtree repeats without augmenting the overall complexity.

Tree template matching in ranked ordered trees by pushdown automata

Autoři
Flouri, T.; Iliopoulos, C S; Janoušek, J.; Melichar, B.; Pissis, Solon
Rok
2012
Publikováno
Journal of Discrete Algorithms. 2012, 2012(17), 478-486. ISSN 1570-8667.
Typ
Článek
Anotace
We consider the problem of tree template matching in ranked ordered trees, and propose a solution based on the bottom-up technique. Specifically, we transform the tree pattern matching problem to a string matching problem, by transforming the tree template and the subject tree to strings representing their postfix notation, and then use pushdown automata as the computational model. The method is analogous to the construction of string pattern matchers. The given tree template is preprocessed once, by constructing a nondeterministic pushdown automaton, which is then transformed to the equivalent deterministic one. Although we prove that the space required for preprocessing is exponential to the size of the tree template in the worst case, the space required for a specific class of tree templates is linear. The time required for the searching phase is linear to the size of the subject tree in both cases.

Computing All Subtree Repeats in Ordered Ranked Trees

Autoři
Janoušek, J.; Melichar, B.; Flouri, T.; Crochemore, M.; Ilioupoulos, C.S.; Christou, M.; Pissis, S.
Rok
2011
Publikováno
Lecture Notes in Computer Science. 2011, 7024 338-343. ISSN 0302-9743.
Typ
Článek
Anotace
We consider the problem of finding all subtree repeats in a given ordered ranked tree. Specifically, we transform the given tree to a string representing its postfix notation, and then propose an algorithm based on the bottom-up technique. The proposed algorithm is divided into two phases: the preprocessing phase, and the phase where all subtree repeats are computed. The linear runtime of the algorithm, as well as the use of linear auxiliary space, are important aspects of its quality.

Indexing Trees by Pushdown Automata for Nonlinear Tree Pattern Matching

Autoři
Rok
2011
Publikováno
FEDCSIS 2011. Los Alamitos: IEEE Computer Society Press, 2011. pp. 871-878. IEEE Catalog Number CFP1185N-USB. ISBN 978-83-60810-22-4.
Typ
Stať ve sborníku
Anotace
A new kind of an acyclic pushdown automaton for an ordered tree is presented. The nonlinear tree pattern pushdown automaton represents a complete index of the tree for nonlinear tree patterns and accepts all nonlinear tree patterns which match the tree. Given a tree with n nodes, the number of such nonlinear tree patterns is O((2+v)n), where v is the number of variables in the patterns. We discuss time and space complexities of the nondeterministic nonlinear tree pattern pushdown automaton and a way of its implementation. The presented pushdown automaton is input-driven and therefore can be determinised.

Regular Tree Expressions and Deterministic Pushdown Automata

Autoři
Janoušek, J.; Melichar, B.; Polách, R.
Rok
2011
Publikováno
Proceeding of the 7th Doctoral Workshop on Mathematical and Engineering Methods in Computer Science. Brno: Brno University of Technology, 2011. pp. 70-77. ISBN 978-80-214-4305-1.
Typ
Stať ve sborníku
Anotace
Regular tree expressions are a formalism for describing regular tree languages, which can be accepted by finite tree automata. The class of regular tree languages is a proper subclass of tree languages whose linear notations can be accepted by deterministic string pushdown automata. In this paper we present a simple and straightforward way of transforming a regular tree expression to an equivalent height-deterministic pushdown automaton which reads input ranked and unranked ordered trees in postfix and postfix bar notation, respectively.

Subtree Oracle Pushdown Automata for Ranked and Unranked Ordered Trees

Autoři
Plicka, M.; Janoušek, J.; Melichar, B.
Rok
2011
Publikováno
FEDCSIS 2011. Los Alamitos: IEEE Computer Society Press, 2011. pp. 903-906. IEEE Catalog Number CFP1185N-USB. ISBN 978-83-60810-22-4.
Typ
Stať ve sborníku
Anotace
Oracle modification of subtree pushdown automata for ranked and unranked ordered trees is presented. Subtree pushdown automata [1] represent a complete index of a tree for subtrees. Subtree oracle pushdown automata, as inspired by string factor oracle automaton [2], have the number of states equal to n + 1, where n is the length of a corresponding linear notation of the tree. This makes the space complexity very low. By analogy with the string factor oracle automaton the subtree oracle automata can also accept some subtrees which are not present in the given subject tree. However, the number of such false positive matches is smaller than in the case of the string factor oracle automaton. The presented pushdown automata are input-driven and therefore they can be determinised.

Tree Indexing by Pushdown Automata and Repeats of Subtrees

Autoři
Flouri, T.; Janoušek, J.; Melichar, B.; Iliopoulos, C.S.; Pissis, S.
Rok
2011
Publikováno
FEDCSIS 2011. Los Alamitos: IEEE Computer Society Press, 2011. pp. 899-902. IEEE Catalog Number CFP1185N-USB. ISBN 978-83-60810-22-4.
Typ
Stať ve sborníku
Anotace
We consider the problem of finding all subtree repeats in a given unranked ordered tree.We show a new, elegant, and simple method, which is based on the construction of a tree indexing structure called the subtree pushdown automaton. We propose a solution for computing all subtree repeats from the deterministic subtree pushdown automaton constructed over the subject tree. The method we present is directly analogous to the relationship between string deterministic suffix automata and factor repeats in a given string.

Tree Template Matching in Ranked Ordered Trees by Pushdown Automata

Autoři
Flouri, T.; Janoušek, J.; Melichar, B.; Iliopoulos, C.S.; Pissis, S.
Rok
2011
Publikováno
Lecture Notes in Computer Science. 2011, 6807 273-281. ISSN 0302-9743.
Typ
Článek
Anotace
We consider the problem of tree template matching in ranked ordered trees, and propose a solution based on the bottom-up technique. Specifically, we transform the tree pattern matching problem to a string matching problem, by transforming the tree template and the subject tree to strings representing their postfix notation, and then use pushdown automata as the computational model. The method is analogous to the construction of string pattern matchers. The given tree template is preprocessed once, by constructing a nondeterministic pushdown automaton, which is then transformed to the equivalent deterministic one. Although we prove that the space required for preprocessing is exponential to the size of the tree template in the general case, the space required for a specific class of tree templates is linear. The time required for the searching phase is linear to the size of the subject tree in both cases.

Aho-Corasick like multiple subtree matching by deterministic pushdown automata

Autoři
Flouri, T.; Janoušek, J.; Melichar, B.
Rok
2010
Publikováno
Proceedings of the 2010 ACM Symposium on Applied Computing. New York: ACM Press, 2010. pp. 2157-2158. ISBN 978-1-60558-639-7.
Typ
Stať ve sborníku
Anotace
Aho-Corasick like multiple subtree matching by deterministic pushdown automata.

Subtree Matching by Pushdown Automata

Autoři
Flouri, T.; Janoušek, J.; Melichar, B.
Rok
2010
Publikováno
COMSIS - Computer Science and Information Systems. 2010, 7(2), 332-357. ISSN 1820-0214.
Typ
Článek
Anotace
Subtree matching is an important problem in Computer Science on which a number of tasks, such as mechanical theorem proving, term-rewriting, symbolic computation and nonprocedural programming languages are based on. A systematic approach to the construction of subtree pattern matchers by deterministic pushdown automata, which read subject trees in prefix and postfix notation, is presented. The method is analogous to the construction of string pattern matchers: for a given pattern, a nondeterministic pushdown automaton is created and is then determinised. In addition, it is shown that the size of the resulting deterministic pushdown automata directly corresponds to the size of the existing string pattern matchers based on finite automata.

On regular tree languages and deterministic pushdown automata

Autoři
Janoušek, J.; Melichar, B.
Rok
2009
Publikováno
Acta Informatica. 2009, 46(7), 533-547. ISSN 0001-5903.
Typ
Článek
Anotace
The theory of formal string languages and of formal tree languages are both important parts of the theory of formal languages.Regular tree languages are recognized by finite tree automata.Trees in their postfix notation can be seen as strings.This paper presents a simple transformation from any given finite tree automaton recognizing a regular tree language to a deterministic pushdown automaton accepting the same tree language in postfix notation.The resulting deterministic pushdown automaton can be implemented easily by an existing parser generator because it is constructed for an LR(0) grammar, and its size directly corresponds to the size of the deterministic finite tree automaton. The class of regular tree languages in postfix notation is a proper subclass of deterministic context-free string languages. Moreover, the class of tree languages which are in their postfix notation deterministic context-free string languages is a proper superclass of the class of regular tree languages.

String Suffix Automata and Subtree Pushdown Automata

Autoři
Rok
2009
Publikováno
Proceedings of the Prague Stringology Conference 2009. Praha: Czech Technical University in Prague, 2009. p. 160-172. ISBN 978-80-01-04403-2.
Typ
Stať ve sborníku
Anotace
String suffix automata accept all suffixes of a given string and belong to the fundamental stringology principles. Extending their transitions by specific pushdown operations results in new subtree pushdown automata, which accept all subtrees of a given subject tree in prefix notation and are analogous to the suffix automata in their properties. The deterministic subtree pushdown automaton accepts an input subtree in time linear to the number of nodes of the subtree and its total size is linear to the number of nodes of the given subject tree.

Subtree Matching by Deterministic Pushdown Automata

Autoři
Flouri, T.; Janoušek, J.; Melichar, B.
Rok
2009
Publikováno
Proceedings of International Multiconference on Computer Science and Information Technology. New York: IEEE Computer Society Press, 2009. pp. 659-666. ISSN 1896-7094. ISBN 978-83-60810-22-4.
Typ
Stať ve sborníku
Anotace
Subtree matching is an important problem in Computer Science on which a number of tasks, such as mechanical theorem proving, term-rewriting, symbolic computation and nonprocedural programming languages are based on. A systematic approach to the construction of subtree pattern matchers by deterministic pushdown automata, which read subject trees in prefix notation, is presented. The method is analogous to the construction of string pattern matchers: for a given pattern, a nondeterministic pushdown automaton is created and then it is determinised. In addition, it is shown that the size of the resulting deterministic pushdown automata directly corresponds to the size of the existing string pattern matchers based on finite automata.