From: Baoquan He <bhe@redhat.com>
To: Barry Song <21cnbao@gmail.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org, chrisl@kernel.org,
kasong@tencent.com, youngjun.park@lge.com, aaron.lu@intel.com,
shikemeng@huaweicloud.com, nphamcs@gmail.com
Subject: Re: [PATCH v4 mm-new 1/2] mm/swap: do not choose swap device according to numa node
Date: Wed, 15 Oct 2025 11:06:34 +0800 [thread overview]
Message-ID: <aO8PuhNfKOQ8SJUV@MiWiFi-R3L-srv> (raw)
In-Reply-To: <CAGsJ_4zvtdqzvBNmQjMy0L6JAz_XpvR=rGgNC_Xdo36QVW-g4g@mail.gmail.com>
On 10/13/25 at 02:09pm, Barry Song wrote:
> > -static int swap_node(struct swap_info_struct *si)
> > -{
> > - struct block_device *bdev;
> > -
> > - if (si->bdev)
> > - bdev = si->bdev;
> > - else
> > - bdev = si->swap_file->f_inode->i_sb->s_bdev;
> > -
> > - return bdev ? bdev->bd_disk->node_id : NUMA_NO_NODE;
> > -}
> > -
>
> Looking at the code, it seems to have some hardware affinity awareness,
> as it uses the swapfile’s bdev’s node_id. Are we regressing cases where
> each node has a closer block device?
I had talked about this with Chris before I posted v1. We don't need to
worry about this because:
1) Kernel code rarely set disk->node_id, all disks just assign
NUMA_NO_NODE to it except of these:
drivers/nvdimm/pmem.c <<pmem_attach_disk>>
drivers/md/dm.c <<alloc_dev>>
For intel ssd Aaron introduced the node based si choosing is for, it
should be Optane which has been discontinued. It could be wrong, then
hope intel can help test so that we can see what impact is brought in.
2) The gap between disk io and memory accessing
Usually memory accessing is nanosecond level, while disk io is
microsecond level, HDD even could be at millisecond. The node affinity
saving nanoseconds is negligible compared to the disk's own acessing
speed. This includes pmem, its io is more than ten times or even more
than memory accessing.
If there's a real system which owns disks belonging to NUMA nodes, we
can test to see if the new round robin way is better or worse then the
node based way.
Thanks
Baoquan
next prev parent reply other threads:[~2025-10-15 3:06 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-11 8:16 [PATCH v4 mm-new 0/2] mm/swapfile.c: select the swap device with default priority round robin Baoquan He
2025-10-11 8:16 ` [PATCH v4 mm-new 1/2] mm/swap: do not choose swap device according to numa node Baoquan He
2025-10-11 20:45 ` kernel test robot
2025-10-11 22:04 ` Andrew Morton
2025-10-12 2:08 ` Baoquan He
2025-10-14 11:56 ` Baoquan He
2025-10-13 6:09 ` Barry Song
2025-10-14 21:50 ` Chris Li
2025-10-15 3:06 ` Baoquan He [this message]
2025-10-15 5:02 ` Barry Song
2025-10-15 6:23 ` Chris Li
2025-10-15 8:09 ` Barry Song
2025-10-15 13:27 ` Chris Li
2025-10-11 8:16 ` [PATCH v4 mm-new 2/2] mm/swap: select swap device with default priority round robin Baoquan He
2025-10-12 20:40 ` Barry Song
2025-10-13 3:58 ` Baoquan He
2025-10-13 6:17 ` Barry Song
2025-10-13 23:07 ` Baoquan He
2025-10-14 22:11 ` Chris Li
2025-10-15 4:29 ` Barry Song
2025-10-15 6:24 ` Chris Li
2025-10-14 22:01 ` Chris Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aO8PuhNfKOQ8SJUV@MiWiFi-R3L-srv \
--to=bhe@redhat.com \
--cc=21cnbao@gmail.com \
--cc=aaron.lu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=chrisl@kernel.org \
--cc=kasong@tencent.com \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=shikemeng@huaweicloud.com \
--cc=youngjun.park@lge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox