linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Keith Busch <keith.busch@intel.com>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Kees Cook <keescook@chromium.org>, X86 ML <x86@kernel.org>,
	Michal Hocko <mhocko@suse.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andy Lutomirski <luto@kernel.org>, Linux MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v5 0/5] mm: Randomize free memory
Date: Mon, 17 Dec 2018 08:32:10 -0800	[thread overview]
Message-ID: <CAPcyv4iW1812gtiuKz8UTPJPhT0_fg+jgo6Z_6Kt9CR2N0Z4Jg@mail.gmail.com> (raw)
In-Reply-To: <2153922.MoOcIFpNeT@aspire.rjw.lan>

On Mon, Dec 17, 2018 at 2:12 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> On Saturday, December 15, 2018 2:48:30 AM CET Dan Williams wrote:
> > Changes since v4: [1]
> > * Default the randomization to off and enable it dynamically based on
> >   the detection of a memory side cache advertised by platform firmware.
> >   In the case of x86 this enumeration comes from the ACPI HMAT. (Michal
> >   and Mel)
> > * Improve the changelog of the patch that introduces the shuffling to
> >   clarify the motivation and better explain the tradeoffs. (Michal and
> >   Mel)
> > * Include the required HMAT enabling in the series.
> >
> > [1]: https://lkml.kernel.org/r/153922180166.838512.8260339805733812034.stgit@dwillia2-desk3.amr.corp.intel.com
> >
> > ---
> >
> > Quote patch 3:
> >
> > Randomization of the page allocator improves the average utilization of
> > a direct-mapped memory-side-cache. Memory side caching is a platform
> > capability that Linux has been previously exposed to in HPC
> > (high-performance computing) environments on specialty platforms. In
> > that instance it was a smaller pool of high-bandwidth-memory relative to
> > higher-capacity / lower-bandwidth DRAM. Now, this capability is going to
> > be found on general purpose server platforms where DRAM is a cache in
> > front of higher latency persistent memory [2].
> >
> > Robert offered an explanation of the state of the art of Linux
> > interactions with memory-side-caches [3], and I copy it here:
> >
> >     It's been a problem in the HPC space:
> >     http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/
> >
> >     A kernel module called zonesort is available to try to help:
> >     https://software.intel.com/en-us/articles/xeon-phi-software
> >
> >     and this abandoned patch series proposed that for the kernel:
> >     https://lkml.org/lkml/2017/8/23/195
> >
> >     Dan's patch series doesn't attempt to ensure buffers won't conflict, but
> >     also reduces the chance that the buffers will. This will make performance
> >     more consistent, albeit slower than "optimal" (which is near impossible
> >     to attain in a general-purpose kernel).  That's better than forcing
> >     users to deploy remedies like:
> >         "To eliminate this gradual degradation, we have added a Stream
> >          measurement to the Node Health Check that follows each job;
> >          nodes are rebooted whenever their measured memory bandwidth
> >          falls below 300 GB/s."
> >
> > A replacement for zonesort was merged upstream in commit cc9aec03e58f
> > "x86/numa_emulation: Introduce uniform split capability". With this
> > numa_emulation capability, memory can be split into cache sized
> > ("near-memory" sized) numa nodes. A bind operation to such a node, and
> > disabling workloads on other nodes, enables full cache performance.
> > However, once the workload exceeds the cache size then cache conflicts
> > are unavoidable. While HPC environments might be able to tolerate
> > time-scheduling of cache sized workloads, for general purpose server
> > platforms, the oversubscribed cache case will be the common case.
> >
> > The worst case scenario is that a server system owner benchmarks a
> > workload at boot with an un-contended cache only to see that performance
> > degrade over time, even below the average cache performance due to
> > excessive conflicts. Randomization clips the peaks and fills in the
> > valleys of cache utilization to yield steady average performance.
> >
> > See patch 3 for more details.
> >
> > [2]: https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/
> > [3]: https://lkml.org/lkml/2018/9/22/54
>
> Has this hibernation been tested with this series applied?

It has not. Is QEMU sufficient? What's your concern?

  reply	other threads:[~2018-12-17 16:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-15  1:48 Dan Williams
2018-12-15  1:48 ` [PATCH v5 1/5] acpi: Create subtable parsing infrastructure Dan Williams
2018-12-15  1:48 ` [PATCH v5 2/5] acpi/numa: Set the memory-side-cache size in memblocks Dan Williams
2018-12-16 12:34   ` Mike Rapoport
2018-12-15  1:48 ` [PATCH v5 3/5] mm: Shuffle initial free memory to improve memory-side-cache utilization Dan Williams
2018-12-16 12:43   ` Mike Rapoport
2018-12-17 19:56     ` Dan Williams
2018-12-18  9:11       ` Mike Rapoport
2018-12-18 19:07         ` Dan Williams
2018-12-15  1:48 ` [PATCH v5 4/5] mm: Move buddy list manipulations into helpers Dan Williams
2018-12-15  1:48 ` [PATCH v5 5/5] mm: Maintain randomization of page free lists Dan Williams
2018-12-17 10:10 ` [PATCH v5 0/5] mm: Randomize free memory Rafael J. Wysocki
2018-12-17 16:32   ` Dan Williams [this message]
2018-12-18 10:45     ` Rafael J. Wysocki
2018-12-19 20:25       ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4iW1812gtiuKz8UTPJPhT0_fg+jgo6Z_6Kt9CR2N0Z4Jg@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=keescook@chromium.org \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mhocko@suse.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rjw@rjwysocki.net \
    --cc=rppt@linux.ibm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox