linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* reply: [PATCHv5] mm: skip CMA pages when they are not available
@ 2024-08-13  9:58 黄朝阳 (Zhaoyang Huang)
  2024-08-16 17:20 ` Breno Leitao
  0 siblings, 1 reply; 2+ messages in thread
From: 黄朝阳 (Zhaoyang Huang) @ 2024-08-13  9:58 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Andrew Morton, Matthew Wilcox, Suren Baghdasaryan, Minchan Kim,
	linux-mm, linux-kernel, Zhaoyang Huang,
	王科 (Ke Wang),
	usamaarif642, riel, hannes, nphamcs

>
>On Wed, May 31, 2023 at 10:51:01AM +0800, zhaoyang.huang wrote:
>> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>>
>> This patch fixes unproductive reclaiming of CMA pages by skipping them
>> when they are not available for current context. It is arise from
>> bellowing OOM issue, which caused by large proportion of MIGRATE_CMA
>pages among free pages.
>
>Hello,
>
>I've been looking into a problem with high memory pressure causing OOMs in
>some of our workloads, and it seems that this change may have introduced lock
>contention when there is high memory pressure.
>
>I've collected some metrics for my specific workload that suggest this change
>has increased the lruvec->lru_lock waittime-max by 500x and the
>waittime-avg by 20x.
>
>Experiment
>==========
>
>The experiment involved 100 hosts, each with 64GB of memory and a single
>Xeon 8321HC CPU. The experiment ran for over 80 hours.
>
>Half of the hosts (50) were configured with the patch reverted and lock stat
>enabled, while the other half was run against the upstream version.
>All machines had hugetlb_cma=6G set as a command-line argument.
>
>In this context, "upstream" refers to kernel release 6.9 with some minor
>changes that should not impact the results.
>
>Workload
>========
>
>The workload is a Java based application that fully utilized the memory, in fact,
>the JVM runs with `-Xms50735m -Xmx50735m` arguments.
>
>Results:
>=======
>
>A few values from lockstat:
>
>                  waittime-max   waittime-total  waittime-avg
>holdtime-max
>6.9:                    242889      15618873933           715
>17485
>6.9-with-revert:           487        688563299            34
>464
>
>The full data could be seen at:
>https://docs.google.com/spreadsheets/d/1Dl-8ImlE4OZrfKjbyWAIWWuQtgD3f
>wEEl9INaZQZ4e8/edit?usp=sharing
>
>Possible causes:
>================
>
>I've been discussing this with colleagues and we're speculating that the high
>contention might be linked to the fact that CMA regions are now being skipped.
>This could potentially extend the duration of the
>isolate_lru_folios() 'while' loop, resulting in increased pressure on the lock.
>
>However, I want to emphasize that I'm not an expert in this area and I am
>simply sharing the data I collected.
Could you please try below patch which could be helpful

https://lore.kernel.org/linux-mm/CAOUHufa7OBtNHKMhfu8wOOE4f0w3b0_2KzzV7-hrc9rVL8e=iw@mail.gmail.com/


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-08-16 17:20 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-13  9:58 reply: [PATCHv5] mm: skip CMA pages when they are not available 黄朝阳 (Zhaoyang Huang)
2024-08-16 17:20 ` Breno Leitao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox