Functions to support general ended bitmaps.
Copyright (C) 1997-2024 Free Software Foundation, Inc.
This file is part of GCC.
GCC is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 3, or (at your option) any later
version.
GCC is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
for more details.
You should have received a copy of the GNU General Public License
along with GCC; see the file COPYING3. If not see
<http://www.gnu.org/licenses/>.
Implementation of sparse integer sets as a linked list or tree.
This sparse set representation is suitable for sparse sets with an
unknown (a priori) universe.
Sets are represented as double-linked lists of container nodes of
type "struct bitmap_element" or as a binary trees of the same
container nodes. Each container node consists of an index for the
first member that could be held in the container, a small array of
integers that represent the members in the container, and pointers
to the next and previous element in the linked list, or left and
right children in the tree. In linked-list form, the container
nodes in the list are sorted in ascending order, i.e. the head of
the list holds the element with the smallest member of the set.
In tree form, nodes to the left have a smaller container index.
For a given member I in the set:
- the element for I will have index is I / (bits per element)
- the position for I within element is I % (bits per element)
This representation is very space-efficient for large sparse sets, and
the size of the set can be changed dynamically without much overhead.
An important parameter is the number of bits per element. In this
implementation, there are 128 bits per element. This results in a
high storage overhead *per element*, but a small overall overhead if
the set is very sparse.
The storage requirements for linked-list sparse sets are O(E), with E->N
in the worst case (a sparse set with large distances between the values
of the set members).
This representation also works well for data flow problems where the size
of the set may grow dynamically, but care must be taken that the member_p,
add_member, and remove_member operations occur with a suitable access
pattern.
The linked-list set representation works well for problems involving very
sparse sets. The canonical example in GCC is, of course, the "set of
sets" for some CFG-based data flow problems (liveness analysis, dominance
frontiers, etc.).
For random-access sparse sets of unknown universe, the binary tree
representation is likely to be a more suitable choice. Theoretical
access times for the binary tree representation are better than those
for the linked-list, but in practice this is only true for truely
random access.
Often the most suitable representation during construction of the set
is not the best choice for the usage of the set. For such cases, the
"view" of the set can be changed from one representation to the other.
This is an O(E) operation:
* from list to tree view : bitmap_tree_view
* from tree to list view : bitmap_list_view
Traversing linked lists or trees can be cache-unfriendly. Performance
can be improved by keeping container nodes in the set grouped together
in memory, using a dedicated obstack for a set (or group of related
sets). Elements allocated on obstacks are released to a free-list and
taken off the free list. If multiple sets are allocated on the same
obstack, elements freed from one set may be re-used for one of the other
sets. This usually helps avoid cache misses.
A single free-list is used for all sets allocated in GGC space. This is
bad for persistent sets, so persistent sets should be allocated on an
obstack whenever possible.
For random-access sets with a known, relatively small universe size, the
SparseSet or simple bitmap representations may be more efficient than a
linked-list set.
LINKED LIST FORM
================
In linked-list form, in-order iterations of the set can be executed
efficiently. The downside is that many random-access operations are
relatively slow, because the linked list has to be traversed to test
membership (i.e. member_p/ add_member/remove_member).
To improve the performance of this set representation, the last
accessed element and its index are cached. For membership tests on
members close to recently accessed members, the cached last element
improves membership test to a constant-time operation.
The following operations can always be performed in O(1) time in
list view:
* clear : bitmap_clear
* smallest_member : bitmap_first_set_bit
* pop_smallest : bitmap_clear_first_set_bit
* choose_one : (not implemented, but could be
in constant time)
The following operations can be performed in O(E) time worst-case in
list view (with E the number of elements in the linked list), but in
O(1) time with a suitable access patterns:
* member_p : bitmap_bit_p
* add_member : bitmap_set_bit / bitmap_set_range
* remove_member : bitmap_clear_bit / bitmap_clear_range
The following operations can be performed in O(E) time in list view:
* cardinality : bitmap_count_bits
* largest_member : bitmap_last_set_bit (but this could
in constant time with a pointer to
the last element in the chain)
* set_size : bitmap_last_set_bit
In tree view the following operations can all be performed in O(log E)
amortized time with O(E) worst-case behavior.
* smallest_member
* pop_smallest
* largest_member
* set_size
* member_p
* add_member
* remove_member
Additionally, the linked-list sparse set representation supports
enumeration of the members in O(E) time:
* forall : EXECUTE_IF_SET_IN_BITMAP
* set_copy : bitmap_copy
* set_intersection : bitmap_intersect_p /
bitmap_and / bitmap_and_into /
EXECUTE_IF_AND_IN_BITMAP
* set_union : bitmap_ior / bitmap_ior_into
* set_difference : bitmap_intersect_compl_p /
bitmap_and_comp / bitmap_and_comp_into /
EXECUTE_IF_AND_COMPL_IN_BITMAP
* set_disjuction : bitmap_xor_comp / bitmap_xor_comp_into
* set_compare : bitmap_equal_p
Some operations on 3 sets that occur frequently in data flow problems
are also implemented:
* A | (B & C) : bitmap_ior_and_into
* A | (B & ~C) : bitmap_ior_and_compl /
bitmap_ior_and_compl_into
BINARY TREE FORM
================
An alternate "view" of a bitmap is its binary tree representation.
For this representation, splay trees are used because they can be
implemented using the same data structures as the linked list, with
no overhead for meta-data (like color, or rank) on the tree nodes.
In binary tree form, random-access to the set is much more efficient
than for the linked-list representation. Downsides are the high cost
of clearing the set, and the relatively large number of operations
necessary to balance the tree. Also, iterating the set members is
not supported.
As for the linked-list representation, the last accessed element and
its index are cached, so that membership tests on the latest accessed
members is a constant-time operation. Other lookups take O(logE)
time amortized (but O(E) time worst-case).
The following operations can always be performed in O(1) time:
* choose_one : (not implemented, but could be
implemented in constant time)
The following operations can be performed in O(logE) time amortized
but O(E) time worst-case, but in O(1) time if the same element is
accessed.
* member_p : bitmap_bit_p
* add_member : bitmap_set_bit
* remove_member : bitmap_clear_bit
The following operations can be performed in O(logE) time amortized
but O(E) time worst-case:
* smallest_member : bitmap_first_set_bit
* largest_member : bitmap_last_set_bit
* set_size : bitmap_last_set_bit
The following operations can be performed in O(E) time:
* clear : bitmap_clear
The binary tree sparse set representation does *not* support any form
of enumeration, and does also *not* support logical operations on sets.
The binary tree representation is only supposed to be used for sets
on which many random-access membership tests will happen.
Bitmap memory usage.