Re: [PATCH] mm: sort freelist by rank number

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Cho KyongHo <pullip.cho@samsung.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, hyesoo.yu@samsung.com,
	janghyuck.kim@samsung.com
Subject: Re: [PATCH] mm: sort freelist by rank number
Date: Tue, 4 Aug 2020 19:20:53 +0900	[thread overview]
Message-ID: <20200804102044.GA4655@KEI> (raw)
In-Reply-To: <947a09ba-968b-4c4d-68bb-d13de9c885a1@suse.cz>

[-- Attachment #1: Type: text/plain, Size: 3601 bytes --]

On Tue, Aug 04, 2020 at 11:12:55AM +0200, Vlastimil Babka wrote:
> On 8/4/20 4:35 AM, Cho KyongHo wrote:
> > On Mon, Aug 03, 2020 at 05:45:55PM +0200, Vlastimil Babka wrote:
> >> On 8/3/20 9:57 AM, David Hildenbrand wrote:
> >> > On 03.08.20 08:10, pullip.cho@samsung.com wrote:
> >> >> From: Cho KyongHo <pullip.cho@samsung.com>
> >> >> 
> >> >> LPDDR5 introduces rank switch delay. If three successive DRAM accesses
> >> >> happens and the first and the second ones access one rank and the last
> >> >> access happens on the other rank, the latency of the last access will
> >> >> be longer than the second one.
> >> >> To address this panelty, we can sort the freelist so that a specific
> >> >> rank is allocated prior to another rank. We expect the page allocator
> >> >> can allocate the pages from the same rank successively with this
> >> >> change. It will hopefully improves the proportion of the consecutive
> >> >> memory accesses to the same rank.
> >> > 
> >> > This certainly needs performance numbers to justify ... and I am sorry,
> >> > "hopefully improves" is not a valid justification :)
> >> > 
> >> > I can imagine that this works well initially, when there hasn't been a
> >> > lot of memory fragmentation going on. But quickly after your system is
> >> > under stress, I doubt this will be very useful. Proof me wrong. ;)
> >> 
> >> Agreed. The implementation of __preferred_rank() seems to be very simple and
> >> optimistic.
> > 
> > DRAM rank is selected by CS bits from DRAM controllers. In the most systems
> > CS bits are alloated to specific bit fields in BUS address. For example,
> > If CS bit is allocated to bit[16] in bus (physical) address in two rank
> > system, all 16KiB with bit[16] = 1 are in the rank 1 and the others are
> > in the rank 0.
> > This patch is not beneficial to other system than the mobile devices
> > with LPDDR5. That is why the default behavior of this patch is noop.
> 
> Hmm, the patch requires at least pageblock_nr_pages, which is 2MB on x86 (dunno
> about ARM), so 16KiB would be way too small. What are the actual granularities then?

16KiB is just an example. pageblock_nr_pages is 4Mb on both ARM and
ARM64. __perferred_rank() works if rank granule >= 4MB.

> >> I think these systems could perhaps better behave as NUMA with (interleaved)
> >> nodes for each rank, then you immediately have all the mempolicies support etc
> >> to achieve what you need? Of course there's some cost as well, but not the costs
> >> of adding hacks to page allocator core?
> > 
> > Thank you for the proposal. NUMA will be helpful to allocate pages from
> > a specific rank programmatically. I should consider NUMA if rank
> > affinity should be also required.
> > However, page allocation overhead by this policy (page migration and
> > reclamation ect.) will give the users worse responsiveness. The intend
> > of this patch is to reduce rank switch delay optimistically without
> > hurting page allocation speed.
> 
> The problem is, without some control of page migration and reclaim, the simple
> preference approach will not work after some uptime, as David suggested. It will
> just mean that the preferred rank will be allocated first, then the
> non-preferred rank (Linux will fill all unused memory with page cache if
> possible), then reclaim will free memory from both ranks without any special
> care, and new allocations will thus come from both ranks.
> 
In fact, I did't considered about NUMA in that way. I now need to check
NUMA if it gives us the same result with this patch.

Thank you again for your comments about NUMA :)
NUMA

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

next prev parent reply	other threads:[~2020-08-04 10:28 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20200803061805epcas2p20faeeff0b31b23d1bc4464972285b561@epcas2p2.samsung.com>
2020-08-03  6:10 ` pullip.cho
2020-08-03  7:57   ` David Hildenbrand
2020-08-03 15:45     ` Vlastimil Babka
2020-08-04  2:35       ` Cho KyongHo
2020-08-04  9:12         ` Vlastimil Babka
2020-08-04 10:20           ` Cho KyongHo [this message]
2020-08-07  7:08     ` Pekka Enberg
2020-08-10  7:32       ` David Hildenbrand
2020-08-18  8:54         ` Cho KyongHo
2020-08-19 13:12           ` Pekka Enberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200804102044.GA4655@KEI \
    --to=pullip.cho@samsung.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=hyesoo.yu@samsung.com \
    --cc=janghyuck.kim@samsung.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox