From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wj0-f197.google.com (mail-wj0-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id EA5436B0033 for ; Fri, 13 Jan 2017 18:56:59 -0500 (EST) Received: by mail-wj0-f197.google.com with SMTP id yr2so2947101wjc.4 for ; Fri, 13 Jan 2017 15:56:59 -0800 (PST) Received: from mail-wm0-x22a.google.com (mail-wm0-x22a.google.com. [2a00:1450:400c:c09::22a]) by mx.google.com with ESMTPS id 32si12870143wru.4.2017.01.13.15.56.58 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 13 Jan 2017 15:56:58 -0800 (PST) Received: by mail-wm0-x22a.google.com with SMTP id n129so15597786wmn.0 for ; Fri, 13 Jan 2017 15:56:58 -0800 (PST) Date: Sat, 14 Jan 2017 02:56:56 +0300 From: "Kirill A. Shutemov" Subject: Re: [LSF/MM TOPIC][LSF/MM ATTEND] Multiple Page Caches, Memory Tiering, Better LRU evictions, Message-ID: <20170113235656.GB26245@node.shutemov.name> References: <61F9233AFAF8C541AAEC03A42CB0D8C7025D002B@MX203CL01.corp.emc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <61F9233AFAF8C541AAEC03A42CB0D8C7025D002B@MX203CL01.corp.emc.com> Sender: owner-linux-mm@kvack.org List-ID: To: "Michaud, Adrian" Cc: "lsf-pc@lists.linux-foundation.org" , "linux-mm@kvack.org" On Fri, Jan 13, 2017 at 09:49:14PM +0000, Michaud, Adrian wrote: > I'd like to attend and propose one or all of the following topics at this year's summit. > > Multiple Page Caches (Software Enhancements) > -------------------------- > Support for multiple page caches can provide many benefits to the kernel. > Different memory types can be put into different page caches. One page > cache for native DDR system memory, another page cache for slower > NV-DIMMs, etc. > General memory can be partitioned into several page caches of different > sizes and could also be dedicated to high priority processes or used > with containers to better isolate memory by dedicating a page cache to a > cgroup process. > Each VMA, or process, could have a page cache identifier, or page > alloc/free callbacks that allow individual VMAs or processes to specify > which page cache they want to use. > Some VMAs might want anonymous memory backed by vast amounts of slower > server class memory like NV-DIMMS. > Some processes or individual VMAs might want their own private page > cache. > Each page cache can have its own eviction policy and low-water markers > Individual page caches could also have their own swap device. Sounds like you're re-inventing NUMA. What am I missing? > Memory Tiering (Software Enhancements) > -------------------- > Using multiple page caches, evictions from one page cache could be moved > and remapped to another page cache instead of unmapped and written to > swap. > If a system has 16GB of high speed DDR memory, and 64GB of slower > memory, one could create a page cache with high speed DDR memory, > another page cache with slower 64GB memory, and evict/copy/remap from > the DDR page cache to the slow memory page cache. Evictions from the > slow memory page cache would then get unmapped and written to swap. I guess it's something that can be done as part of NUMA balancing. > Better LRU evictions (Software and Hardware Enhancements) > ------------------------- > Add a page fault counter to the page struct to help colorize page demand. > We could suggest to Intel/AMD and other architecture leaders that TLB > entries also have a translation counter (8-10 bits is sufficient) > instead of just an "accessed" bit. Scanning/clearing access bits is > obviously inefficient; however, if TLBs had a translation counter > instead of a single accessed bit then scanning and recording the amount > of activity each TLB has would be significantly better and allow us to > bettern calculate LRU pages for evictions. Except that would make memory accesses slower. Even access bit handing is noticible performance hit: processor has to write into page table entry on first access to the page. What you're proposing is making 2^8-2^10 first accesses slower. Sounds like no-go for me. > TLB Shootdown (Hardware Enhancements) > -------------------------- > We should stomp our feet and demand that TLB shootdowns should be > hardware assisted in future architectures. Current TLB shootdown on x86 > is horribly inefficient and obviously doesn't scale. The QPI/UPI local > bus protocol should provide TLB range invalidation broadcast so that a > single CPU can concurrently notify other CPU/cores (with a selection > mask) that a shared TLB entry has changed. Sending an IPI to each core > is horribly inefficient; especially with the core counts increasing and > the frequency of TLB unmapping/remapping also possibly increasing > shortly with new server class memory extension technology. IIUC, the best you can get from hardware is IPI behind the scene. I doubt it worth the effort. -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org