From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F159AC71135 for ; Fri, 13 Jun 2025 07:37:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8FC176B0092; Fri, 13 Jun 2025 03:37:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 87B436B0093; Fri, 13 Jun 2025 03:37:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 76BF26B0095; Fri, 13 Jun 2025 03:37:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 555716B0092 for ; Fri, 13 Jun 2025 03:37:12 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 02A8481347 for ; Fri, 13 Jun 2025 07:37:11 +0000 (UTC) X-FDA: 83549571504.28.434F576 Received: from mail-lj1-f170.google.com (mail-lj1-f170.google.com [209.85.208.170]) by imf10.hostedemail.com (Postfix) with ESMTP id 109D3C0004 for ; Fri, 13 Jun 2025 07:37:09 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=P0QorCOw; spf=pass (imf10.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.170 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749800230; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1JMXW+Fubsd2ScoNgRtDk/k43gI4z7MQw3P6LblrECY=; b=c9gQapv4maPEqmwiGEpvOhXhimZbz84v3iYRkB5FF1eNaaoEWJGtlbXp3xmSHQmQWWkhrd LL6GrRQKaYdhsbDGluH5Ys3yJWOzxXq8W5sm6jmvd5uyTuoALrRH0JGh8fnyXrrk4h2tva irPVA1cMsQu7AZfcJPEn2gJ0IxHTFz4= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=P0QorCOw; spf=pass (imf10.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.170 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749800230; a=rsa-sha256; cv=none; b=UcZmViqg3hlVVRRvhXQf31Q1SYDwKOkSZ9ACdgnLFxWuZkDIzK4lU4+rPzcINPpdQ3uatN GQWHo9qxgkqmVMu28nB5EsdY1kobAThYV0ijOGB4AF2rGsWtiuLe1K0D8BEoGK8o1IzDEH W3uU6wsAgXB+aRsqxNHgP/ZBZVmBK7s= Received: by mail-lj1-f170.google.com with SMTP id 38308e7fff4ca-32aa864e0e9so19407081fa.2 for ; Fri, 13 Jun 2025 00:37:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749800228; x=1750405028; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=1JMXW+Fubsd2ScoNgRtDk/k43gI4z7MQw3P6LblrECY=; b=P0QorCOw4ZIimtmos3grexnp/qnEPrsIBTBs5V9LI1OU2ES+kCxcR54BCO+d+d9Las 3BAGfYPaiXFgULtdvcBQGyuEy3NDbRcMdKddJAG4ZTAb5MEHjYxdHk0W7zKuimgogPDy 3p76iTmPsybBcRqOsimHfRwc4HDHLpY9tmkB6AvRRjTK+rUvtdA6fhHzxUBGyI9qcCof 5WbK3V7gn93KZoR0GJqoze1R5AdXUKVUfYiUireM+x/c0EPXLGk9ln1fTN6ULP9r+ZCv 39Ppcp07Y4iSXbaltDGUAH7Bh7vsUeHcWKy0NDx0WkZXOl4eAC0FfNulkYAImmTGswbj pRJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749800228; x=1750405028; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1JMXW+Fubsd2ScoNgRtDk/k43gI4z7MQw3P6LblrECY=; b=q/tct2vW28pt2Wnqd01Of+MuIegcNj7PUDQ0/Dm4MpiOwJ0+7YZICGG+t8S+40M/9k B1k580aJ+V/AAm+L0P0tQaW/I46r28MCsQySQI3OTtiY9BhOiPllpIwcphfGD5UmHMi2 3xoAhFkefoT/UQ4mlTcqidYRkbZ1wwH0/xkPlto++g/lvU3/WnREpHew01dEySEAWMK8 yy1EXwCLs+Y1iVUw69HRT0tKLbxl7iRN1ySzZ74o/gzG0Azk8s83F/Rc84bhqEXdkAWh lasvGIqLAWnQ8kwOEegXJlz+4x/nbii1vIRRwGZde/PucFiN17N0nucuoicfhmpfxIyZ 2O7A== X-Forwarded-Encrypted: i=1; AJvYcCVw1aJvEQu971xWUbnB15NJrYZ7z+JDu/E1S+fIy3h7sR4NLV3P/wupRfaG19DwFkYEiCT4APTATA==@kvack.org X-Gm-Message-State: AOJu0YziLzI98nkwYwGyTvmATGRLmUBOpPSlqfXqVeokomMx6IerKV4M CCTkwhVn6AuGcK7r15Cb5jDTCp6f95bfH0D/uhr8IfUqo30wQU9Xzs15EOUS/ADge2jLUDyDA3/ 4tFtZYkUKO1x8udunocjJZUi4nYLJRR0= X-Gm-Gg: ASbGncunDPaPBY9S+tWyrEqvXPaGkmR1h51nij/+D5iJ5zvvJ/sYHO1iOW9/fgNs00m IaVtcMJ8trixLeQ/mm0jY2XYZYzoDcHJ/H0zoq2Plw1js7zMXd41NK+efnv+ggI1dAkQL+A4KVW OXXhYKi9I1zHEby25mPodOvZe+o7ADEcIQEVxFej5OOpM= X-Google-Smtp-Source: AGHT+IFMuNXpkWg337BKT0ddarpu8Q4pCBucJjSwDblfnCzSGAj33wZ3yIdKu2WNBM7imca3M/FSbtyTNri2JvwjQi4= X-Received: by 2002:a2e:a781:0:b0:32a:de39:eb4c with SMTP id 38308e7fff4ca-32b3ea8365emr4291071fa.18.1749800227901; Fri, 13 Jun 2025 00:37:07 -0700 (PDT) MIME-Version: 1.0 References: <20250612103743.3385842-1-youngjun.park@lge.com> <20250612103743.3385842-3-youngjun.park@lge.com> In-Reply-To: From: Kairui Song Date: Fri, 13 Jun 2025 15:36:49 +0800 X-Gm-Features: AX0GCFvxU12N2ikQf07wfssB5TNdh0YDBnXk-zYOmBRR-5m8A5AUKNKIwDo2D0g Message-ID: Subject: Re: [RFC PATCH 2/2] mm: swap: apply per cgroup swap priority mechansim on swap layer To: YoungJun Park Cc: Nhat Pham , linux-mm@kvack.org, akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, shikemeng@huaweicloud.com, bhe@redhat.com, baohua@kernel.org, chrisl@kernel.org, muchun.song@linux.dev, iamjoonsoo.kim@lge.com, taejoon.song@lge.com, gunho.lee@lge.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 109D3C0004 X-Rspamd-Server: rspam03 X-Rspam-User: X-Stat-Signature: su9t1i9h1j933erxyrbdopan7o8cxpuu X-HE-Tag: 1749800229-373915 X-HE-Meta: U2FsdGVkX1+0wc0utNkpOCV3m752DecLczBGV82zW0dAO6E6nCA2f+rUq9sU08ZmVm4Y0oQiSfSRabILCBxqlXobXbAUcFUhJswQtDgIOq1Fx9uFBi9qpuBm04XZLywndIdftydf8VBm6jOzyNQa49XhWjb9B1Qc4wev+4lLl3Zj6bH91zHomgQH8Wr0PiVi8Pg3S9OtVhAwH1YajpLqKAB3389hi6CELfOUcmXMs7N7Ei9an+zlB/6RM7z2uPBf4l85NJ3iY1j/kOFTh+Soe2N2u3QbOKsyoEzmvxOfB2ZJQLV1AY80x1+d+Q79fSfVDoy+h7pAfLQadUe4Vt94qdNTnN/pPqwvyry9NUhmZ9wbMXN+JMDVCWLXlje37wxipZJ9VnVC7eHNw/q1eha9CNy3R/NFuwg5x+BqqOmtqfUrZ7iR/qDcpx21Hz9csMecgRu/nirhtz7D7rwSSq7wQbQ5NeHClkW8VS7u2zOCfbfghcykPRA7APVqD3GndhLNHvFC4fEKK6UqYDUBycQfUefPVTsxPj71a407rV4XZjGWFw1VHj0rn6BRDa43zdO+XeFwWVmuhl77OQsmJ/6jVt8QV+LYhgytjXSqLCARD++lgysYX0bb0NI5T4xjiadEtWs1avVXoLQ7v3pYCE9V31Rc3x+agTCQ/e1uRYhYY3zZfAfEyrpgsroGS3pq/0U89Me31TbMq+ejrC5DBuKHPgJg3Wjz9YR/BYPfR1lQ8pgJd74gK/91ZqS7IY40lLHg0QZk2ThfVJWytgEXgdv6Kv46xLh+VAi6ofpmFEYy0qmfOyp/EOxWCptMMWzvyYwp9KwhCOcbW1PZDOtVQKBh350hxN3QamjjFA1C6DPRjbOm7J4cKacDsG8d9AEqia51MHYFM4e6EW8I2HDXc/oztSsnmdLyQ7S3HfUR6+BqAW1sJsYAMP7MgNzibB0HyXOMpEXPmL573CkswP33n6w e75LcAXD gaiINGV3FDqmj65HTSsmk/YtwrtyYVy4DmvSVmN8uIbuNPGTuYNUZ1ghmyZUebonGpX2Ll22y4xDHgsdqucQVvfGrntnCQsfiwiFADR2JHjikkxiUmXQhIHE8BwnClQ5jS/ONRpdMq8k9v9Y9owEpxBCmnq4Na31v9G23aw15asL9AwWEzjXgAxPNVUfmUF+quSVvpq4bDi8a8PyabzttIwR3f5Sl6CA+0IDPDc9rK4MV+ClQtgSrjVhTAd+/UO9yiHofENRxae5EJQCX/XOQYs91DfxjAD8DEEAn0w+bV7CMyKBZwePMBbfjxnDGRPa14eraLCqWVHdO2sKaann8iejJh94qGHKqcydyO7AKUZZ8tGqsmLJ5BOVu+A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jun 13, 2025 at 3:11=E2=80=AFPM YoungJun Park wrote: > > On Thu, Jun 12, 2025 at 01:08:08PM -0700, Nhat Pham wrote: > > On Thu, Jun 12, 2025 at 11:20=E2=80=AFAM Kairui Song = wrote: > > > > > > On Fri, Jun 13, 2025 at 1:28=E2=80=AFAM Nhat Pham = wrote: > > > > > > > > On Thu, Jun 12, 2025 at 4:14=E2=80=AFAM Kairui Song wrote: > > > > > > > > > > On Thu, Jun 12, 2025 at 6:43=E2=80=AFPM w= rote: > > > > > > > > > > > > From: "youngjun.park" > > > > > > > > > > > > > > > > Hi, Youngjun, > > > > > > > > > > Thanks for sharing this series. > > > > > > > > > > > This patch implements swap device selection and swap on/off pro= pagation > > > > > > when a cgroup-specific swap priority is set. > > > > > > > > > > > > There is one workaround to this implementation as follows. > > > > > > Current per-cpu swap cluster enforces swap device selection bas= ed solely > > > > > > on CPU locality, overriding the swap cgroup's configured priori= ties. > > > > > > > > > > I've been thinking about this, we can switch to a per-cgroup-per-= cpu > > > > > next cluster selector, the problem with current code is that swap > > > > > > > > What about per-cpu-per-order-per-swap-device :-? Number of swap > > > > devices is gonna be smaller than number of cgroups, right? > > > > > > Hi Nhat, > > > > > > The problem is per cgroup makes more sense (I was suggested to use > > > cgroup level locality at the very beginning of the implementation of > > > the allocator in the mail list, but it was hard to do so at that > > > time), for container environments, a cgroup is a container that runs > > > one type of workload, so it has its own locality. Things like systemd > > > also organize different desktop workloads into cgroups. The whole > > > point is about cgroup. > > > > Yeah I know what cgroup represents. Which is why I mentioned in the > > next paragraph that are still making decisions based per-cgroup - we > > just organize the per-cpu cache based on swap devices. This way, two > > cgroups with similar/same priority list can share the clusters, for > > each swapfile, in each CPU. There will be a lot less duplication and > > overhead. And two cgroups with different priority lists won't > > interfere with each other, since they'll target different swapfiles. > > > > Unless we want to nudge the swapfiles/clusters to be self-partitioned > > among the cgroups? :) IOW, each cluster contains pages mostly from a > > single cgroup (with some stranglers mixed in). I suppose that will be > > very useful for swap on rotational drives where read contiguity is > > imperative, but not sure about other backends :-? > > Anyway, no strong opinions to be completely honest :) Was just > > throwing out some ideas. Per-cgroup-per-cpu-per-order sounds good to > > me too, if it's easy to do. > > Good point! > I agree with the mention that self-partitioned clusters and duplicated pr= iority. > One concern is the cost of synchronization. > Specifically the one incurred when accessing the prioritized swap device > From a simple performance perspective, a per-cgroup-per-CPU implementatio= n > seems favorable - in line with the current swap allocation fastpath. > > It seems most reasonable to carefully compare the pros and cons of the > tow approaches. > > To summaraize, > > Option 1. per-cgroup-per-cpu > Pros: upstream fit. performance. > Cons: duplicate priority(some memory structure consumtion cost), > self partioned cluster > > Option 2. per-cpu-per-order(per-device) > Pros: Cons of Option1 > Cons: Pros of Option1 > > It's not easy to draw a definitive conclusion right away, > I should also evaluate other pros and cons that may arise during actual > implementation. > so I'd like to take some time to review things in more detail > and share my thoughs and conclusions in the next patch series. > > What do you think, Nhat and Kairui? Ah, I think what might be best fits here is, each cgroup have a pcp device list, and each device have a pcp cluster list: folio -> mem_cgroup -> swap_priority (maybe a more generic name is better?) -> swap_device_pcp (recording only the *si per order) swap_device_info -> swap_cluster_pcp (cluster offset per order) And if mem_cgroup -> swap_priority is NULL, fallback to a global swap_device_pcp. This seems to fit what Nhat suggested, and easy to implement, since both si and folio->memcg are accessible easily.