rpc : resource management rework#7562
Conversation
ggml-rpc.cpp
Outdated
There was a problem hiding this comment.
should we lock the instances here? will we access it from different thread simultaneously?
There was a problem hiding this comment.
as I said in the description this implementation is not thread-safe; I'd like to get some feedback on the reference count approach and then I will add thread-safety
ggml-rpc.cpp
Outdated
There was a problem hiding this comment.
looks we have 2 reference counts for same object: 1. rpc_backend.ref_count, 2, inside the rpc_backend_ptr.
IMO, better to merge then together.
There was a problem hiding this comment.
maybe we can try something link intrusive shared pointer: std::enable_shared_from_this
There was a problem hiding this comment.
Yes, I think that the correct way to do this would be to change the pointers in instances to weak_ptr, then the backend will be automatically freed by shared_ptr on the last instance. create_rpc_backend would need to check if the weak_ptr in instances is still alive.
There was a problem hiding this comment.
yeah, that could be the best, inside the instances we maintain a weak_ptr<rpc_backend>, without increase the reference count, and then at the create_rpc_backend we obtain shared_ptr from weak_ptr.
this also save the erase call at free_rpc_backend
There was a problem hiding this comment.
Unfortunately it is not that simple. We keep a backend reference in ggml_backend_rpc_buffer_type_context and we use this reference when allocating new buffers. If we free this reference in ggml_backend_rpc_free() then we won't be able to allocate new buffers.
There was a problem hiding this comment.
IIRC, we have several structure have a reference to rpc_backend here:
ggml_backend_rpc_buffer_type_contextggml_backend_rpc_buffer_contextggml_backend_rpc_contextinstances.
first 1-3 should have a strong reference to let them allocate buffer after ggml_backend_rpc_free call explicitly - that's the case you mention.
and for the 4, i think we can be make it as weak_ptr since the life-span of ggml_backend_rpc_context is expected longer than the rpc_backend?
There was a problem hiding this comment.
When do we free the reference which is kept in ggml_backend_rpc_buffer_type_context? If you free this reference, then you can no longer allocate buffers. If you don't free this reference, then the backend will never be freed because of this reference.
I don't see how to solve this without explicit reference count. I also don't see how weak_ptr can help here.
There was a problem hiding this comment.
Buffer types are never freed, I think it would complicate the user code too much to require buffer types to be freed, for no real benefit. In the RPC backend, buffer types only need a connection during initialization and during buffer allocation. You can make a connection during initialization to obtain the alignment and max size, drop it, and only open it again once a buffer is allocated, and then rely on the shared_ptr to keep it alive until all buffers and backend instances have been freed.
This is probably complicated because the RPC backend uses a ggml_backend instance to represent a connection, but connections and ggml_backend backend instances should probably be different concepts.
There was a problem hiding this comment.
ok, let's say that the connection is not tied to the backend instance and we have a separate entity (e.g. rpc_connection) which is being referenced with shared_ptr. How do we obtain a shared_ptr to the connection when a new buffer is allocated?
There was a problem hiding this comment.
I think something like this should work:
std::shared_ptr<connection_t> get_connection(const std::string & endpoint) {
static std::map<std::string, std::weak_ptr<connection_t>> connections;
auto it = connections.find(endpoint);
if (it != connections.end()) {
if (auto connection = it->second.lock()) {
return connection;
}
}
auto connection = std::make_shared<connection_t>(endpoint);
connections[endpoint] = connection;
return connection;
}Use this function when you need to create a new buffer or backend instance. You can also use it during initialization of the buffer type, just use a local shared_ptr in the function only, but don't keep the instance.
|
I have decoupled the backend and its corresponding socket. Sockets are cached and shared between RPC buffers. |
|
Nice work! |
|
draw a picture for the backend objects here, for better understanding: |

This patch tries to address the concerns raised in PR #7435. We track how many times an RPC backend is referenced and we deallocate its resources when this count becomes 0. The reference count is increased when a new RPC buffer is allocated or
ggml_backend_rpc_init()is called. Respectively it is decreased when an RPC buffer is freed orggml_backend_rpc_free()is called.The implementation is not thread-safe. I will address thread-safety in a follow-up patch.