From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot0-f199.google.com (mail-ot0-f199.google.com [74.125.82.199]) by kanga.kvack.org (Postfix) with ESMTP id 34C146B0009 for ; Wed, 21 Mar 2018 00:21:36 -0400 (EDT) Received: by mail-ot0-f199.google.com with SMTP id i9-v6so2232558oth.3 for ; Tue, 20 Mar 2018 21:21:36 -0700 (PDT) Received: from mail-sor-f41.google.com (mail-sor-f41.google.com. [209.85.220.41]) by mx.google.com with SMTPS id c48-v6sor1441269otj.42.2018.03.20.21.21.34 for (Google Transport Security); Tue, 20 Mar 2018 21:21:35 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20180321015944.GB28705@intel.com> References: <20180320085452.24641-1-aaron.lu@intel.com> <20180320085452.24641-3-aaron.lu@intel.com> <20180321015944.GB28705@intel.com> From: "Figo.zhang" Date: Tue, 20 Mar 2018 21:21:33 -0700 Message-ID: Subject: Re: [RFC PATCH v2 2/4] mm/__free_one_page: skip merge for order-0 page unless compaction failed Content-Type: multipart/alternative; boundary="000000000000c4dba90567e4864b" Sender: owner-linux-mm@kvack.org List-ID: To: Aaron Lu Cc: Linux MM , LKML , Andrew Morton , Huang Ying , Dave Hansen , Kemi Wang , Tim Chen , Andi Kleen , Michal Hocko , Vlastimil Babka , Mel Gorman , Matthew Wilcox , Daniel Jordan --000000000000c4dba90567e4864b Content-Type: text/plain; charset="UTF-8" 2018-03-20 18:59 GMT-07:00 Aaron Lu : > On Tue, Mar 20, 2018 at 03:58:51PM -0700, Figo.zhang wrote: > > 2018-03-20 1:54 GMT-07:00 Aaron Lu : > > > > > Running will-it-scale/page_fault1 process mode workload on a 2 sockets > > > Intel Skylake server showed severe lock contention of zone->lock, as > > > high as about 80%(42% on allocation path and 35% on free path) CPU > > > cycles are burnt spinning. With perf, the most time consuming part > inside > > > that lock on free path is cache missing on page structures, mostly on > > > the to-be-freed page's buddy due to merging. > > > > > > One way to avoid this overhead is not do any merging at all for order-0 > > > pages. With this approach, the lock contention for zone->lock on free > > > path dropped to 1.1% but allocation side still has as high as 42% lock > > > contention. In the meantime, the dropped lock contention on free side > > > doesn't translate to performance increase, instead, it's consumed by > > > increased lock contention of the per node lru_lock(rose from 5% to 37%) > > > and the final performance slightly dropped about 1%. > > > > > > Though performance dropped a little, it almost eliminated zone lock > > > contention on free path and it is the foundation for the next patch > > > that eliminates zone lock contention for allocation path. > > > > > > A new document file called "struct_page_filed" is added to explain > > > the newly reused field in "struct page". > > > > > > Suggested-by: Dave Hansen > > > Signed-off-by: Aaron Lu > > > --- > > > Documentation/vm/struct_page_field | 5 +++ > > > include/linux/mm_types.h | 1 + > > > mm/compaction.c | 13 +++++- > > > mm/internal.h | 27 ++++++++++++ > > > mm/page_alloc.c | 89 ++++++++++++++++++++++++++++++ > > > +++----- > > > 5 files changed, 122 insertions(+), 13 deletions(-) > > > create mode 100644 Documentation/vm/struct_page_field > > > > > > diff --git a/Documentation/vm/struct_page_field > b/Documentation/vm/struct_ > > > page_field > > > new file mode 100644 > > > index 000000000000..1ab6c19ccc7a > > > --- /dev/null > > > +++ b/Documentation/vm/struct_page_field > > > @@ -0,0 +1,5 @@ > > > +buddy_merge_skipped: > > > +Used to indicate this page skipped merging when added to buddy. This > > > +field only makes sense if the page is in Buddy and is order zero. > > > +It's a bug if any higher order pages in Buddy has this field set. > > > +Shares space with index. > > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > > > index fd1af6b9591d..7edc4e102a8e 100644 > > > --- a/include/linux/mm_types.h > > > +++ b/include/linux/mm_types.h > > > @@ -91,6 +91,7 @@ struct page { > > > pgoff_t index; /* Our offset within mapping. > */ > > > void *freelist; /* sl[aou]b first free object > */ > > > /* page_deferred_list().prev -- second tail page */ > > > + bool buddy_merge_skipped; /* skipped merging when > added to > > > buddy */ > > > }; > > > > > > union { > > > diff --git a/mm/compaction.c b/mm/compaction.c > > > index 2c8999d027ab..fb9031fdca41 100644 > > > --- a/mm/compaction.c > > > +++ b/mm/compaction.c > > > @@ -776,8 +776,19 @@ isolate_migratepages_block(struct compact_control > > > *cc, unsigned long low_pfn, > > > * potential isolation targets. > > > */ > > > if (PageBuddy(page)) { > > > - unsigned long freepage_order = > > > page_order_unsafe(page); > > > + unsigned long freepage_order; > > > > > > + /* > > > + * If this is a merge_skipped page, do merge > now > > > + * since high-order pages are needed. zone lock > > > + * isn't taken for the merge_skipped check so > the > > > + * check could be wrong but the worst case is > we > > > + * lose a merge opportunity. > > > + */ > > > + if (page_merge_was_skipped(page)) > > > + try_to_merge_page(page); > > > + > > > + freepage_order = page_order_unsafe(page); > > > /* > > > * Without lock, we cannot be sure that what we > > > got is > > > * a valid page order. Consider only values in > the > > > > > > > when the system memory is very very low and try a lot of failures and > then > > >If the system memory is very very low, it doesn't appear there is a need > >to do compaction since compaction needs to have enough order 0 pages to > >make a high order one. > > suppose that in free_one_page() will try to merge to high order anytime , but now in your patch, those merge has postponed when system in low memory status, it is very easy let system trigger low memory state and get poor performance. --000000000000c4dba90567e4864b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


2018-03-20 18:59 GMT-07:00 Aaron Lu <aaron.lu@intel.com>:
On Tue, Mar 20, 2018 at 03:58:51PM -0700, Figo.zhang wrote:
> 2018-03-20 1:54 GMT-07:00 Aaron Lu <aaron.lu@intel.com>:
>
> > Running will-it-scale/page_fault1 process mode workload on a 2 so= ckets
> > Intel Skylake server showed severe lock contention of zone->lo= ck, as
> > high as about 80%(42% on allocation path and 35% on free path) CP= U
> > cycles are burnt spinning. With perf, the most time consuming par= t inside
> > that lock on free path is cache missing on page structures, mostl= y on
> > the to-be-freed page's buddy due to merging.
> >
> > One way to avoid this overhead is not do any merging at all for o= rder-0
> > pages. With this approach, the lock contention for zone->lock = on free
> > path dropped to 1.1% but allocation side still has as high as 42%= lock
> > contention. In the meantime, the dropped lock contention on free = side
> > doesn't translate to performance increase, instead, it's = consumed by
> > increased lock contention of the per node lru_lock(rose from 5% t= o 37%)
> > and the final performance slightly dropped about 1%.
> >
> > Though performance dropped a little, it almost eliminated zone lo= ck
> > contention on free path and it is the foundation for the next pat= ch
> > that eliminates zone lock contention for allocation path.
> >
> > A new document file called "struct_page_filed" is added= to explain
> > the newly reused field in "struct page".
> >
> > Suggested-by: Dave Hansen <dave.hansen@intel.com>
> > Signed-off-by: Aaron Lu <aaron.lu@intel.com>
> > ---
> >=C2=A0 Documentation/vm/struct_page_field |=C2=A0 5 +++
> >=C2=A0 include/linux/mm_types.h=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0|=C2=A0 1 +
> >=C2=A0 mm/compaction.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 13 +++++-
> >=C2=A0 mm/internal.h=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 27 ++++++++++++
> >=C2=A0 mm/page_alloc.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 | 89 ++++++++++++++++++++++++++++++
> > +++-----
> >=C2=A0 5 files changed, 122 insertions(+), 13 deletions(-)
> >=C2=A0 create mode 100644 Documentation/vm/struct_page_field<= br> > >
> > diff --git a/Documentation/vm/struct_page_field b/Documentat= ion/vm/struct_
> > page_field
> > new file mode 100644
> > index 000000000000..1ab6c19ccc7a
> > --- /dev/null
> > +++ b/Documentation/vm/struct_page_field
> > @@ -0,0 +1,5 @@
> > +buddy_merge_skipped:
> > +Used to indicate this page skipped merging when added to buddy. = This
> > +field only makes sense if the page is in Buddy and is order zero= .
> > +It's a bug if any higher order pages in Buddy has this field= set.
> > +Shares space with index.
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h<= br> > > index fd1af6b9591d..7edc4e102a8e 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -91,6 +91,7 @@ struct page {
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pgof= f_t index;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* Our offset within mapping. = */
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0void= *freelist;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* sl[aou]b first free object = */
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* p= age_deferred_list().prev=C2=A0 =C2=A0 -- second tail page */
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0bool budd= y_merge_skipped; /* skipped merging when added to
> > buddy */
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0};
> >
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0union {
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index 2c8999d027ab..fb9031fdca41 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -776,8 +776,19 @@ isolate_migratepages_block(struct compa= ct_control
> > *cc, unsigned long low_pfn,
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * p= otential isolation targets.
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 */<= br> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (= PageBuddy(page)) {
> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0unsigned long freepage_order =3D
> > page_order_unsafe(page);
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0unsigned long freepage_order;
> >
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0/*
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 * If this is a merge_skipped page, do merge now
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 * since high-order pages are needed. zone lock
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 * isn't taken for the merge_skipped check so the > > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 * check could be wrong but the worst case is we
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 * lose a merge opportunity.
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 */
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0if (page_merge_was_skipped(page))
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0try_to_merge_page(page); > > +
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0freepage_order =3D page_order_unsafe(page);
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0/*
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 * Without lock, we cannot be sure that what we
> > got is
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 * a valid page order. Consider only values in the<= br> > >
>
> when the system memory is very very low and try a lot of failures and = then

>If the system memory is very very low, it doesn't appea= r there is a need
>to do compaction since compaction needs to have en= ough order 0 pages to
>make a high order one.

=C2=A0
suppose that= in free_one_page() will try to merge to high order anytime , but now in yo= ur patch,=C2=A0
those merge has postponed when system in low memo= ry status, it is very easy let system trigger=C2=A0
low memory st= ate and get poor performance.

--000000000000c4dba90567e4864b--