Reclaiming Data Structures with Cycles
Friday, 14 October 2016
In my CppCon 2016 Trip Report I mentioned Herb Sutter's Plenary, and his deferred reclamation library. The problem to be solved is that some data structures cannot be represented as a DAG, where each node in the structure has a clear owner; these data structures are general graphs, where any node may hold a pointer to any other node, and thus you cannot say that a node is no longer needed just because one of the pointers to it has been removed. Likewise, it is possible that there may be a whole segment of the data structure which is no longer accessible from outside, and the only remaining pointers are those internal to the data structure. This whole section should therefore be destroyed, otherwise we have a leak.
In the data structure shown below, the left-hand set of nodes have 3 external pointers keeping them alive, but the right-hand set are not accessible from outside, and so should be destroyed.
Deferred Reclamation
Herb's solution is to use a special pointer type deferred_ptr<T>
, and a
special heap and allocator to create the nodes. You can then explicitly check
for inaccessible nodes by calling collect()
on the heap object, or rely on all
the nodes being destroyed when the heap itself is destroyed. For the details,
see Herb's description.
One thing that stood out for me was the idea that destruction was deferred — nodes are not reclaimed immediately, but at some later time. Also, the use of a custom allocator seemed unusual. I wondered if there was an alternative way of handling things.
Internal pointers vs Root pointers
I've been contemplating the issue for the last couple of weeks, and come up with an alternative scheme that destroys unreachable objects as soon as they become unreachable, even if those unreachable objects hold pointers to each other. Rather than using a custom heap and allocator, it uses a custom base class, and distinct pointer types for internal pointers and root pointers.
My library is also still experimental, but the source code is freely available under the BSD license.
The library provides two smart pointer class templates: root_ptr<T>
and
internal_ptr<T>
. root_ptr<T>
is directly equivalent to
std::shared_ptr<T>
: it is a reference-counted smart pointer. For many uses,
you could use root_ptr<T>
as a direct replacement for std::shared_ptr<T>
and your code will have identical behaviour. root_ptr<T>
is intended to
represent an external owner for your data structure. For a tree it could
hold the pointer to the root node. For a general graph it could be used to hold
each of the external nodes of the graph.
The difference comes with internal_ptr<T>
. This holds a pointer to another
object within the data structure. It is an internal pointer to another
part of the same larger data structure. It is also reference counted, so if
there are no root_ptr<T>
or internal_ptr<T>
objects pointing to a given
object then it is immediately destroyed, but even one internal_ptr<T>
can be
enough to keep an object alive as part of a larger data structure.
The "magic" is that if an object is only pointed to by internal_ptr<T>
pointers, then it is only kept alive as long as the whole data structure has an
accessible root in the form of an root_ptr<T>
or an object with an
internal_ptr<T>
that is not pointed to by either an root_ptr<T>
or an
internal_ptr<T>
.
This is made possible by the internal nodes deriving from internal_base
, much
like std::enable_shared_from_this<T>
enables additional functionality when
using std::shared_ptr<T>
. This base class is then passed to the
internal_ptr<T>
constructor, to identify which object the internal_ptr<T>
belongs to.
For example, a singly-linked list could be implemented like so:
class List{
struct Node: jss::internal_base{
jss::internal_ptr<Node> next;
data_type data;
Node(data_type data_):next(this),data(data_){}
};
jss::root_ptr<Node> head;
public:
void push_front(data_type new_data){
auto new_node=jss::make_owner<Node>(new_data);
new_node->next=head;
head=new_node;
}
data_type pop_front(){
auto old_head=head;
if(!old_head)
throw std::runtime_error("Empty list");
head=old_head->next;
return old_head->data;
}
void clear(){
head.reset();
}
};
This actually has an advantage over using std::shared_ptr<Node>
for the links
in the list, due to another feature of the library. When a group of interlinked
nodes becomes unreachable, then firstly each node is marked as unreachable, thus
making any internal_ptr<T>
s that point to them become equal to nullptr
. Then
all the unreachable nodes are destroyed in turn. All this is done with iteration
rather than recursion, and thus avoids the deep recursive destructor chaining
that can occur when using std::shared_ptr<T>
. This is similar to the behaviour
of Herb's deferred_ptr<T>
during a collect()
call on the deferred_heap
.
local_ptr<T>
completes the set: you can use a local_ptr<T>
when traversing a
data structure that uses internal_ptr<T>
. local_ptr<T>
does not hold a
reference, and is not in any way involved in the lifetime tracking of the
nodes. It is intended to be used when you need to keep a local pointer to a
node, but you're not updating the data structure, and don't need that pointer to
keep the node alive. e.g.
class List{
// as above
public:
void for_each(std::function<void(data_type&)> f){
jss::local_ptr<Node> node=head;
while(node){
f(node->data);
node=node->next;
}
}
}
Warning: root_ptr<T>
and internal_ptr<T>
are not safe for use if
multiple threads may be accessing any of the nodes in the data structure while
any thread is modifying any part of it. The data structure as a whole
must be protected with external synchronization in a multi-threaded context.
How it works
The key to this system is twofold. Firstly the nodes in the data structure
derive from internal_base
, which allows the library to store a back-pointer to
the smart pointer control block in the node itself, as long as the head of the
list of internal_ptr<T>
s that belong to that node. Secondly, the control
blocks each hold a list of back-pointers to the control blocks of the objects
that point to them via internal_ptr<T>
. When a reference to a node is dropped
(either from an root_ptr<T>
or an internal_ptr<T>
), if that node has no
remaining root_ptr<T>
s that point to it, the back-pointers are checked. The
chain of back-pointers is followed until either a node is found that has an
root_ptr<T>
that points to it, or a node is found that does not have a
control block (e.g. because it is allocated on the stack, or owned by
std::shared_ptr<T>
). If either is found, then the data structure is
reachable, and thus kept alive. If neither is found once all the
back-pointers have been followed, then the set of nodes that were checked is
unreachable, and thus can be destroyed. Each of the unreachable nodes is then
marked as such, which causes internal_ptr<T>
s that refer to them to become
nullptr
, and thus prevents resurrection of the nodes. Finally, the unreachable
nodes are all destroyed in an unspecified order. The scan and destroy is done
with iteration rather than recursion to avoid the potential for deep recursive
nesting on large interconnected graphs of nodes.
The downside is that the time taken to drop a reference to a node is dependent on the number of nodes in the data structure, in particular the number of nodes that have to be examined in order to find an owned node.
Note: only dropping a reference to a node (destroying a pointer, or reassigning a pointer) incurs this cost. Constructing the data structure is still relatively low overhead.
Feedback
Please let me know if you have any comments on my internal pointer library, especially if you have either used it successfully, or have tried to use it and found it doesn't work for you.
Posted by Anthony Williams
[/ cplusplus /] permanent link
Tags: cplusplus, reference counting, garbage collection
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Design and Content Copyright © 2005-2025 Just Software Solutions Ltd. All rights reserved. | Privacy Policy
7 Comments
i am impressed by the quality work being displayed here having lots of meaningful posts written here!!!
great job!!!
Reminds me a bit of the solution I came up with for the regex objects in Boost.Xpressive, which I describe here: <http://bit.ly/2dC5ixu>. Is cleanup guaranteed nothrow (no-alloc) with your scheme?
@Eric: Thanks for your comment. I hadn't seen your scheme for regex tracking, it does look similar at first glance. I'm interested to look at the code now --- I thought about the idea of using the different pointer types to keep track of reachability more cheaply, but didn't get anywhere.
No, my implementation doesn't guarantee no-throw cleanup; currently it builds a pair of vectors holding pointers it has processed when checking for reachability, and pointers that still need checking. It's possible that the allocation for these could be moved earlier.
I'm sure the code can be improved, maybe dramatically, but I wanted to write it up while the idea was fresh in my mind.
Aren't weak pointers already sufficient for the job? Java did it that way AFAIK.
@Marek: no, weak pointers are not sufficient. weak_ptr does not prevent destruction, and requires that there is a shared_ptr somewhere that owns the object. internal_ptr is different: an internal_ptr *does* prevent destruction, provided that the object containing the internal_ptr is itself owned.
Java has implicit Garbage Collection, so doesn't need any of this.
Truly an awesome post.I really like it.I wanna tell you that we are providing discount coupon here.If you want to create another blog like this,then visit our website.Here you find some discount coupon through Justhost.Thank you.