Welcome! Log In Create A New Profile

Userspace API for sharing memory between unrelated processes

Posted by MConrad 
Userspace API for sharing memory between unrelated processes
November 13, 2009 03:30AM
Hi, I was playing with ideas for sending data between processes without copying, and came up with the following idea for a memory page sharing API. I'm wondering if there are existing facilities that can handle this, whether this API makes sense, and how much potential work it might be to implement (as a kernel patch).

Suppose I have a range of pages that I have allocated with mmap(MAP_ANONYMOUS|MAP_SHARED) and I want to deliver tham to another process. I may want to give the other process read access, or actually *give* the pages to that process such that I no longer have them mapped.

Here's my idea for an API to do this:
/**
 * Create an offer to the specified process for the given range of address space.
 * The address space must be a whole multiple of pages.
 * The address space must be validly mapped for the current process.
 * Changes specified by this function do not occur until the remote process
 *   accepts the offer.
 * An offer can be cancelled by offering the same address with a size of 0.
 *   If the target process has already accepted the offer, it is too late, and the
 *   call will simply fail as this address is no longer valid for the current process.
 * If all permissions are dropped, then the memory is unmapped from the current
 *   process when the offer is claimed.
 * Flags:
 *    VMO_GRANT_R - grant read access to the remote process
 *    VMO_GRANT_W - grant write access to the remote process
 *    VMO_GRANT_X - grant exec access to the remote process
 *    VMO_DROP_R - release this process' read permission for the pages
 *    VMO_DROP_W - release this process' write permission for the pages
 *    VMO_DROP_X - release this process' exec permission for the pages
 *
 *    VMO_DISOWN - combination of all "drop" flags
 *    VMO_GRANTALL - combination of all "grant" flags
 */
int vm_offer(int dest_pid, void *addr, int len, int flags);

/**
 * Receive an offer of VM pages from another process.
 * The offer must have already been made by the source process.
 * The offer is identified (from other offers made by that process) by specifying the remote virtual address.
 * Flags:
 *   (none, yet)
 */
int vm_claim(int src_pid, void *remote_addr, void **local_addr, int *size, int flags);

/**
 * List sent and received offers.
 * list is a buffer to which an array of offer records will be written
 * count should be initialized to the size (in records) of the list
 * upon successful return, list is populated, and count is the number of entries
 * upon failure, list is unaltered and count is the number of entries needed in order to return successfully
 * Note that offers are dynamic in nature and may change before the entries can be acted on.
 */
int vm_listoffer(vm_offer_t *list, int *count);

I'm aware that I can get a similar effect by creating a file in tmpfs, mmaping it into the second process, and unmapping from the first. However, I don't like the idea of the overhead of tmpfs and having to choose filenames. If I create a messaging system from this API with lots of tiny 1-page messages it seems that the overhead of creating tmpfs files could be significant.

Perhaps I should have stated that sooner- I'm looking for a mechanism that can handle lots and lots of action, such that it is more efficient than writing the data to a pipe (and has the added advantage that any 2 processes can use it without any setup work, rather than needing to create pipes or sockets and client/server designs)

Any advice? Is something like this out there? Is it conceivable that an implementation could operate faster than writing 1000 bytes through a UNIX socket?

Thanks in advance

P.S. It occurs to me that if this API were implemented, its use could be extended to MANY things, like passing kernel-allocated data buffers to userspace in a more streamlined and simple manner, again without copying anything.
Re: Userspace API for sharing memory between unrelated processes
November 13, 2009 07:41AM
Why posix shared memory does not work for you?
MConrad
Re: Userspace API for sharing memory between unrelated processes
November 13, 2009 01:12PM
Posix shared memory is really just a fancy way to create files in tmpfs and mmap them. So, the reason is overhead during open/close and having to give a unique name to each message.

Also, I find it annoying that a shared memory object can be left after all processes using it have exited. So, its also extra bookkeeping.

Oh, and another reason is that you have to deal with user/group permissions instead of just specifying a target process.

I like things to be as simple as possible. And what i really want to do is just pass pages of mem from one process to another.
Re: Userspace API for sharing memory between unrelated processes
November 16, 2009 01:46PM
Posix shm is a standard that works now, and is ok for quite a few cases. So using it if possible is better than inventing a wheel smiling smiley

What is your use case?
Maybe zero-copy pipes (created with vmsplice() and co) will handle it, if shm does not?

API that you propose is somewhat questionable, e.g. because of performance penalty caused by page table changes. AFAIK it is faster to copy a couple of kilobytes with memcpy() than to remap a page.

Of course you still may implement it and prove that I'm wrong smiling smiley
MConrad
Re: Userspace API for sharing memory between unrelated processes
November 18, 2009 04:50PM
I didn't have any specific use case. I know how to solve lots of IPC problems, but I was just playing around with an idea of how to solve them more easily (less code, less development time), and maybe more efficiently (less run time).

vmsplice looked useful, but not really general purpose. I'd still have to set up a pipe between processes first.

I have no idea how much work is involved with altering page tables, which is why I'm asking on a kernel newbies forum :-)

My overall idea was to implement general-purpose IPC (large, small, or huge messages) with no dependence on user id, files, permissions, dedicated listener threads, network ports, or unique symbol names, and the simplest API possible. Also, it would be nice if the API consisted of atomic operations, instead of all the usual partial-message-in-a-buffer interrupted-by-signal etc sort of thing.

> AFAIK it is faster to copy a couple of kilobytes with memcpy() than to remap a page

Do you have any suggestions on how I might test this? Anything I could read to learn the details? recommendadtions on what source file to read? (the mmap implementation would be a good start, I guess)
Author:

Your Email:


Subject:


Message: