Re: [PATCH] mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Alexander Duyck <alexander.duyck@gmail.com>
To: Jann Horn <jannh@google.com>
Cc: linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	 LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.com>,  Vlastimil Babka <vbabka@suse.cz>,
	Pavel Tatashin <pavel.tatashin@microsoft.com>,
	 Oscar Salvador <osalvador@suse.de>,
	Mel Gorman <mgorman@techsingularity.net>,
	 Aaron Lu <aaron.lu@intel.com>, Netdev <netdev@vger.kernel.org>,
	 Alexander Duyck <alexander.h.duyck@linux.intel.com>
Subject: Re: [PATCH] mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocs
Date: Thu, 14 Feb 2019 07:37:21 -0800	[thread overview]
Message-ID: <CAKgT0Ue99Zj88m_mfvnaj3_WspkzfABFsH==B_0X0s85z9Xvaw@mail.gmail.com> (raw)
In-Reply-To: <CAG48ez0v7QwtCKDs5vgRJht8yfZR5nudEpkMOLaDX-=47WeFqA@mail.gmail.com>

On Thu, Feb 14, 2019 at 7:13 AM Jann Horn <jannh@google.com> wrote:
>
> On Wed, Feb 13, 2019 at 11:42 PM Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
> > On Wed, Feb 13, 2019 at 12:42 PM Jann Horn <jannh@google.com> wrote:
> > > The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum
> > > number of references that we might need to create in the fastpath later,
> > > the bump-allocation fastpath only has to modify the non-atomic bias value
> > > that tracks the number of extra references we hold instead of the atomic
> > > refcount. The maximum number of allocations we can serve (under the
> > > assumption that no allocation is made with size 0) is nc->size, so that's
> > > the bias used.
> > >
> > > However, even when all memory in the allocation has been given away, a
> > > reference to the page is still held; and in the `offset < 0` slowpath, the
> > > page may be reused if everyone else has dropped their references.
> > > This means that the necessary number of references is actually
> > > `nc->size+1`.
> > >
> > > Luckily, from a quick grep, it looks like the only path that can call
> > > page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which
> > > requires CAP_NET_ADMIN in the init namespace and is only intended to be
> > > used for kernel testing and fuzzing.
> >
> > Actually that has me somewhat concerned. I wouldn't be surprised if
> > most drivers expect the netdev_alloc_frags call to at least output an
> > SKB_DATA_ALIGN sized value.
> >
> > We probably should update __netdev_alloc_frag and __napi_alloc_frag so
> > that they will pass fragsz through SKB_DATA_ALIGN.
>
> Do you want to do a separate patch for that? I'd like to not mix
> logically separate changes in a single patch, and I also don't have a
> good understanding of the alignment concerns here.

You could just include it as a separate patch with your work.
Otherwise I will get to it when I have time.

The point is the issue you pointed out will actually cause other
issues if the behavior is maintained since you shouldn't be getting
unaligned blocks out of the frags API anyway.

> > > To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the
> > > `offset < 0` path, below the virt_to_page() call, and then repeatedly call
> > > writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI,
> > > with a vector consisting of 15 elements containing 1 byte each.
> > >
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Jann Horn <jannh@google.com>
> > > ---
> > >  mm/page_alloc.c | 8 ++++----
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 35fdde041f5c..46285d28e43b 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -4675,11 +4675,11 @@ void *page_frag_alloc(struct page_frag_cache *nc,
> > >                 /* Even if we own the page, we do not use atomic_set().
> > >                  * This would break get_page_unless_zero() users.
> > >                  */
> > > -               page_ref_add(page, size - 1);
> > > +               page_ref_add(page, size);
> > >
> > >                 /* reset page count bias and offset to start of new frag */
> > >                 nc->pfmemalloc = page_is_pfmemalloc(page);
> > > -               nc->pagecnt_bias = size;
> > > +               nc->pagecnt_bias = size + 1;
> > >                 nc->offset = size;
> > >         }
> > >
> > > @@ -4695,10 +4695,10 @@ void *page_frag_alloc(struct page_frag_cache *nc,
> > >                 size = nc->size;
> > >  #endif
> > >                 /* OK, page count is 0, we can safely set it */
> > > -               set_page_count(page, size);
> > > +               set_page_count(page, size + 1);
> > >
> > >                 /* reset page count bias and offset to start of new frag */
> > > -               nc->pagecnt_bias = size;
> > > +               nc->pagecnt_bias = size + 1;
> > >                 offset = size - fragsz;
> > >         }
> >
> > If we already have to add a constant it might be better to just use
> > PAGE_FRAG_CACHE_MAX_SIZE + 1 in all these spots where you are having
> > to use "size + 1" instead of "size". That way we can avoid having to
> > add a constant to a register value and then program that value.
> > instead we can just assign the constant value right from the start.
>
> I doubt that these few instructions make a difference, but sure, I can
> send a v2 with that changed.

You would be surprised. They all end up adding up over time.

     prev parent reply	other threads:[~2019-02-14 15:37 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-13 20:41 Jann Horn
2019-02-13 20:59 ` Andrew Morton
2019-02-13 21:11   ` Jann Horn
2019-02-13 21:40     ` Andrew Morton
2019-02-13 22:42 ` Alexander Duyck
2019-02-14 15:13   ` Jann Horn
2019-02-14 15:37     ` Alexander Duyck [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKgT0Ue99Zj88m_mfvnaj3_WspkzfABFsH==B_0X0s85z9Xvaw@mail.gmail.com' \
    --to=alexander.duyck@gmail.com \
    --cc=aaron.lu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=netdev@vger.kernel.org \
    --cc=osalvador@suse.de \
    --cc=pavel.tatashin@microsoft.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox