linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Alexey Korolev <akorolex@gmail.com>
Cc: Alexey Korolev <akorolev@infradead.org>, linux-mm@kvack.org
Subject: Re: HugeTLB mapping for drivers (sample driver)
Date: Tue, 21 Jul 2009 10:40:00 +0100	[thread overview]
Message-ID: <20090721094000.GB25383@csn.ul.ie> (raw)
In-Reply-To: <202cde0e0907210232gc8a6119jc7f2ba522d22a80d@mail.gmail.com>

On Tue, Jul 21, 2009 at 09:32:34PM +1200, Alexey Korolev wrote:
> Hi,
> >
> > Did the OOM killer really trigger and select a process for killing or
> > did the process itself just get killed with an out-of-memory message? I
> > would have expected the latter.
> >
>
> OMM killer triggered in case of private mapping on attempt to access a
> page under private mapping. It was because code did not check the pages
> availability at mmap time. Will be fixed.
> 

That's a surprise. I should check out why the OOM killer fired instead
of just killing the application that failed to fault the page.

> >> In fact there should be quite few cases when private mapping makes
> >> sense for drivers and mapping DMA buffers. I thought about possible
> >> solutions. The question is what to choose.
> >>
> >> 1. Forbid private mappings for drivers in case of hugetlb. (But this
> >> limits functionality - it is not so good)
> >
> > For a long time, this was the "solution" for hugetlbfs.
> >
> >> 2. Allow private mapping. Use hugetlbfs hstates. (But it forces user
> >> to know how much hugetlb memory it is necessary to reserve for
> >> drivers)
> >
> > You can defer working out the reservations until mmap() time,
> > particularly if you are using dynamic hugepage pool resizing instead of
> > static allocation.
> >
> >> 3. Allow private mapping. Use special hstate for driver and driver
> >> should tell how much memory needs to be reserved for it. (Not clear
> >> yet how to behave if we are out of reserved space)
> >>
> >> Could you please suggest what is the best solution? May be some other options?
> >>
> >
> > The only solution that springs to mind is the same one used by hugetlbfs
> > and that is that reservations are taken at mmap() time for the size of the
> > mapping. In your case, you prefault but either way, the hugepages exist.
> >
> Yes, that looks sane. I'll follow this way. In a particular case if
> driver do not
> need a private mapping mmap will return error. Thanks for the advice.
> I'm about
> to modify the patches. I'll try to involve  hugetlb reservation
> functions as much  as
> possible and track reservations by special hstate for drivers.
> 

Ok but bear in mind you are now going far down the road of
re-implementing hugetlbfs and you should re-examine why you cannot use
the hidden internal hugetlbfs mount similar to what shared memory does.

> > What then happens for hugetlbfs is that only the process that called mmap()
> > is guaranteed their faults will succeed. If a child process incurs a COW
> > and the hugepages are not available, the child process gets killed. If
> > the parent process performs COW and the huge pages are not available, it
> > unmaps the pages from the child process so that COW becomes unnecessary. If
> > the child process then faults, it gets killed.  This is implemented in
> > mm/hugetlb.c#unmap_ref_private().
> 
> So on out of memory COW hugetlb code prefer applications to be killed by
> SIGSEGV (SIGBUS?) instead of OOM. Okk.
> 

It prefers to kill the children with SIGKILL than have the parent
application randomly fail. This happens when the pool is insufficient for
any part of the application to continue. What it was intended to address
was hugepage-aware-applications-using-MAP_PRIVATE that fork() and exec()
helper applications/monitors which appears to be fairly common. There was
a sizable window between fork() and exec() where the parent process could
get killed accessing its MAP_PRIVATE area and taking a COW even though the
child would never need it. Guaranteeing that the process that called mmap()
would always succeed fault was better than it being a random choice between
parents and children.

The impact is that applications that use MAP_PRIVATE that expect
children to get a full private copy of hugetlb-backed areas are going to
have a bad time but the expectation is that these applications are very
rare and they'll be told "don't do that".

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

      reply	other threads:[~2009-07-21  9:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-14  2:07 Alexey Korolev
2009-07-14 10:27 ` Mel Gorman
2009-07-15  0:08   ` Alexey Korolev
2009-07-19 13:39     ` Alexey Korolev
2009-07-20  8:11       ` Mel Gorman
2009-07-21  9:32         ` Alexey Korolev
2009-07-21  9:40           ` Mel Gorman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090721094000.GB25383@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=akorolev@infradead.org \
    --cc=akorolex@gmail.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox