linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Weimin Tchen <wtchen@giganet.com>
To: linux-mm@kvack.org
Subject: questions on having a driver pin user memory for DMA
Date: Wed, 19 Apr 2000 19:02:32 -0400	[thread overview]
Message-ID: <38FE3B08.9FFB4C4E@giganet.com> (raw)

Hello,

Could you advise a former DEC VMS driver-guy who is a recent Linux
convert with much to learn? I'm working on a driver for a NIC that
support the Virtual Interface Architecture, which allows user processes
to register arbitrary virtual address ranges for DMA network transmit or
receive. The driver locks the user pages against paging and loads the
NIC with the physical addresses of these pages. Thus the user process
can initiate network DMA using its buffers directly (instead of having a
driver copy between a buffer in kernel memory and a user buffer)..

There are at least 3 issues to resolve in registering this user memory
for DMA that I need help on:

1. lock against paging
2. after a fork(), copy-on-write changes the physical address of the
user buffer
3.a memory leak that can hang the system, if a process does: malloc a
memory buffer, register this memory, free the memory, THEN deregister
the memory.

- Issue 1.
Initially our driver locked memory by  incrementing the page count. When
that turned out to be insufficient, I added setting the PG_locked bit
for the page frame. (However this bit is actually for locking during an
IO transfer. Thus I wonder if using PG_locked would cause a problem if
the user memory is also mapped to a file.) Since toggling the PG_locked
bit is not a counted semphore, it also doesn't handle pages that are
registered multiple times. A common case would be 2 adjacent
registrations that end & start on the same page (since the Virtual
Interface Architecture allows buffers to be registered which are NOT
paged aligned). Thus the first deregister will unlock the page even if
it is part of another buffer setup for DMA.

I'm probably misreading this, but it appears that  /mm/memory.c:
map_user_kiobuf() pins user memory by just incrementing the page count.
Will this actually prevent paging or will it only prevent the virtual
address from being deleted from the user address space by a free()? I
see that  /drivers/char/raw.c uses also has an #ifdef'ed call to
lock_kiovec(). This function lock_kiovec() checks the PG_locked flag,
and notes that multiply mapped pages are "Bad news". But our driver
needs to support multiple mappings.

Instead of using flags in the low-level page frame, I tried to use flags
in the vm_area_struct (process memory region) structures. I also hoped
to fix issue II (copy-on-write after fork) by setting VM_SHARED along w/
VM_LOCKED. So I tried adding private function from mlock.c into our
driver, by skipping the resource check and not aligning on page
boundaries and not merging segments. (Hopefully this would allow
adjacent registrations in the same page.)  However after these changes,
the driver could not load since these routines reference others that
handle memory AVL trees (which had appeared to be public but actually
aren't exported):

- insert_vms_struct(),
- make_pages_present(),
- vm_area_cachep().


- Issue 2. (copy-on-write after fork):
A process uses our driver to register memory for DMA by having the
driver convert the process's buffer virtual pages into physical page
adddresses which are then setup in the NIC for DMA. If the process forks
a child, then the Linux kernel appears to avoid overhead by copying the
vm_area_struct's and sharing the actual physical pages. If a write is
done, the child gets the physical pages and the parent gets new physical
pages which are copies. As a result the hardware is not pointing to the
correct physical pages in the parent. I was hoping to prevent this
copy-on-write by making the memory shared (which could have program side
effects) by setting VM_SHARED in the vm_area_struct. (Strangely VM_SHM
doesn't appear to be used much). But as noted above, I can not use
functions handing vm_area_struct's like those in mlock.c.

Instead I have *temporarily* solved problems I & II by setting the
PG_reserved flag in page frame (instead of PG_locked). But I'd much
appreciate any advice on a better approach.


- Issue 3: memory leak:
There is a system memory leak which results from a slight application
programming error, when a user buffer is free()'ed before being
deregistered by our driver. Repeated operations can hang the system.
When memory is registered, our driver increments the page count to 2.
This appears to prevent the free() & deregister (only decrements to 1)
from releasing the memory. This is actually needed to prevent releasing
the memory before unmapping it NIC from DMA. Instead of using the count,
PG_reserved can be used.. However this also prevents the count from
getting decremented and releasing as expected.

I had expected free() to just put the memory back on the heap which
would be cleaned-up at process exit. But glibc-2.1.2\malloc\malloc.c
indicates that with large buffers, free() calls malloc_trim() which
calls sbrk() with a negative argument. PG_reserved appears to prevent
memory cleanup ( /mm/page_alloc.c:__free_pages() checks
if (!PageReserved(page) && atomic_dec_and_test(&page->count)) before
calling free_pages_ok() ). I haven't traced how our earlier use of
PG_locked and incrementing the count, will also prevent free() from
decrementing the count.

When a process exits, the file_operations release function is run if the
NIC device has not been closed. Thus by artifically dropping the page
count in this function and doing __free_pages() , the leak can be
prevented. However the driver would need to be modified to have our
library's function to close_the_NIC()  not do a system close(),  in
order to just use the file_operations release function for final
cleanup. There apear to be other system dependencies involved here, so
I'm not pursuing this further.

I don't understand why process exit code cleans up the virtual address
space before closing remaining devices. ( /kernel/exit.c:do_exit() calls
__exit_mm() and later calls __exit_files() ). I had hoped to cleanup
registered memory  when __exit_files() runs our driver's release
function and let __exit_mm() do the rest.

Thanks for any advice,
-Weimin Tchen

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux.eu.org/Linux-MM/

             reply	other threads:[~2000-04-19 22:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2000-04-19 23:02 Weimin Tchen [this message]
2000-04-20  6:39 ` Eric W. Biederman
2000-04-20  9:20   ` Ingo Oeser
2000-04-20 12:30   ` Stephen C. Tweedie
2000-04-20 12:27 ` Stephen C. Tweedie
2000-04-20 23:43   ` Weimin Tchen
2000-04-21 18:20     ` Kanoj Sarcar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=38FE3B08.9FFB4C4E@giganet.com \
    --to=wtchen@giganet.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox