From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3241C54756 for ; Thu, 22 May 2025 03:55:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6F52A6B0085; Wed, 21 May 2025 23:55:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6A5326B0088; Wed, 21 May 2025 23:55:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 596CC6B0089; Wed, 21 May 2025 23:55:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 351CA6B0085 for ; Wed, 21 May 2025 23:55:43 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id BC40881938 for ; Thu, 22 May 2025 03:55:42 +0000 (UTC) X-FDA: 83469179724.08.ECD0B1F Received: from mail-qv1-f44.google.com (mail-qv1-f44.google.com [209.85.219.44]) by imf29.hostedemail.com (Postfix) with ESMTP id C5164120002 for ; Thu, 22 May 2025 03:55:40 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=gXGEwQhI; spf=pass (imf29.hostedemail.com: domain of gourry@gourry.net designates 209.85.219.44 as permitted sender) smtp.mailfrom=gourry@gourry.net; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747886140; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8mI7jaAjlm03MguLMu7hq8pN/hBV1jXhl7kdKFnO2NU=; b=v1J35xX1wA34HvbgHf+5ZPU4BVov5u66bUfBXI9C663AfZImCL8OynbSxlXetaSOBFl09O Cc9XHF2kLSmFkFpLhe3X5mYHLOxoLM7721Wi4AJPF1bn7tRPho1yIAfVok7PldM9b0RQHK cM3K+ZdiMBgNyVFmtsCwYaAL7T/kH9U= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=gXGEwQhI; spf=pass (imf29.hostedemail.com: domain of gourry@gourry.net designates 209.85.219.44 as permitted sender) smtp.mailfrom=gourry@gourry.net; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747886140; a=rsa-sha256; cv=none; b=s0FHWcHSMixjJMpWB9LqToxGZLRj18Bx9lDDNF+TnL9Pjg8VZblhsaRxwwS3EdMETqywj5 ViRDn/tDJ1RTyPUIeXgBoSIB+c4yRKZWUvfvrKKg3Rkbf7LazPs8N9rSeIVyUp9yS3wqTe MsX1cOPZmeSgSdheAGpZKaFfKWarWvM= Received: by mail-qv1-f44.google.com with SMTP id 6a1803df08f44-6f8aabbffaeso69927976d6.0 for ; Wed, 21 May 2025 20:55:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1747886140; x=1748490940; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=8mI7jaAjlm03MguLMu7hq8pN/hBV1jXhl7kdKFnO2NU=; b=gXGEwQhI9wYra1WvuzZGjHoCtCI2F4UpeOjWbN3t1NDTIGMR0gZVHBByLjvX32Omos wq0xDOj7eSpLxOo+3waHpsMgdSS3EwyxMmqrxDUikqGXOk7uKaWuX5e9QLRBIKzxEReE Xv6OC5t5LxVLe6VXaVYWKLuyo+HG/vSdAW2beorTJ7fokWy2ucXQMG/MPU9n2DtKbyYh 0w2fos14noXE6yR3JKoAJS/o4fxWAj2OVNgLE4CLFmURhgCi56lcDSUK7MG5T6ouS8QN H+Ycf526kopBWx1M0+iFoyTa7ZM53wEYj14PswhSVON1x9xvvKKUY5xNfJrolJWM8C01 uxuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747886140; x=1748490940; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=8mI7jaAjlm03MguLMu7hq8pN/hBV1jXhl7kdKFnO2NU=; b=YXY8fq9Z3ZnEtQkyRKlDCjDL0uCROyOPtxTivEYlNWMoyQylSiEvllY1XFNs6KS3U2 WKc67iEXaVBSQrKKKhZ806ocqucyCPJQmj7A2oubyt/z8yWBrnBcxhqB/JOxC59PKBYL HA2ZVZjpKhGuhNd+RJ+JHiLd0DNSuIVjzBRmRYpwPlgdlgkWdsHEbaft79Xc1W20WmZA VUPq5Plbt60TA2eHoLhBHVCM7lpvM7GhZ2t00vdMB4ii0cNEC4zQodOk7zJRwUfn/qfo a0oMvUXWl/t0O4orIibqtGjspHyNJBBdpMOI86vo1+ZaRwU69E+FVbxxHYmyjPQId5Fv e9gA== X-Forwarded-Encrypted: i=1; AJvYcCVf6r8LX/dLFo7UrKAoQ3cUe3io7Sd692T3cUOLauWE5qLbm9X9185kb2owr5eMzoZTRWmBJ4DBbg==@kvack.org X-Gm-Message-State: AOJu0YzEtZJhudYVn9og5Tukj900mAOijmQRBGw1ILVRTcdeWU9c1PKQ lcI55yMpuJDXZdPQ52hMRkZNU/NX72F1R0WH+Sj01KoeJnt28MEmvYueTlVdntqaptY= X-Gm-Gg: ASbGncsXTKSQXDGb63QXhu5p3DxyLLHgPhbLsHV68ltrfisu+m8kwUHS0IU/1FJZCTm peuRsm+qDfUCKk+FrF8Pk1kGMzBe3CXnLY5ZNReqtp2YpCO5mX1wjnZszx3vV8eGmlEUj6kbVvP PCk/j+cFETA0QFa8f3kxdFTQEYnMBK3KTLgVzP6H2FTeqAQmzcXhBc9oEZQiAxGHJHmWWtXDEUd 8frcoKEVFm72rZiEbgl/wTQuG0mj2YRtMLCBMODUBymSxc4PV4nQdKcgqog4LV1tzPDD9HzSpOI Yk0krYfiBhtPAE9NcdhoANAFv34tYWRuHJ5XFPi4q/V/BTQDX3fN+C2SecNYcntj9Ii+HAmM4nF r12LFQGKyh5mJ7rXljN4b+6o2Lzi99Lw= X-Google-Smtp-Source: AGHT+IEkGOQ0bhcpLcDDZnMJqAwfKHli7a9vrCasjbfW7FnwVFK8djJGrK0PUUshL0piefB+qZQAMA== X-Received: by 2002:a05:6214:194c:b0:6e8:9535:b00 with SMTP id 6a1803df08f44-6f8b2cf260dmr325198096d6.12.1747886139886; Wed, 21 May 2025 20:55:39 -0700 (PDT) Received: from gourry-fedora-PF4VCD3F (pool-96-255-20-42.washdc.ftas.verizon.net. [96.255.20.42]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6f8b08beb6asm94042586d6.55.2025.05.21.20.55.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 May 2025 20:55:39 -0700 (PDT) Date: Wed, 21 May 2025 23:55:36 -0400 From: Gregory Price To: Bharata B Rao Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Jonathan.Cameron@huawei.com, dave.hansen@intel.com, hannes@cmpxchg.org, mgorman@techsingularity.net, mingo@redhat.com, peterz@infradead.org, raghavendra.kt@amd.com, riel@surriel.com, rientjes@google.com, sj@kernel.org, weixugc@google.com, willy@infradead.org, ying.huang@linux.alibaba.com, ziy@nvidia.com, dave@stgolabs.net, nifan.cxl@gmail.com, joshua.hahnjy@gmail.com, xuezhengchu@huawei.com, yiannis@zptcorp.com, akpm@linux-foundation.org, david@redhat.com Subject: Re: [RFC PATCH v0 2/2] mm: sched: Batch-migrate misplaced pages Message-ID: References: <20250521080238.209678-1-bharata@amd.com> <20250521080238.209678-3-bharata@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250521080238.209678-3-bharata@amd.com> X-Rspamd-Queue-Id: C5164120002 X-Stat-Signature: ukr9j1s9zm6hcq6r4hud57sdmwjhgprb X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1747886140-287015 X-HE-Meta: U2FsdGVkX18h959b6G9UAEy0ProhdQNlE1zxf/UBmaTGmiwr1BdZxJRhdu/nXVjZ9GrrDGBM9Qhsx+c7rL4riXclo8l7LEZ+gLs1slSkucxacCsce8nW4vkkdLGlM9N0ICg4trlbXt44VgDVIE9z5eVF8iEZPxXKjD5m6/Gc/WLKLStNi8mvg+qlryF2DU/vDcxmDZ/2GWAuqe2JoPYUtio3VTYSE/RD8alWVQwBml70qWNVu/ocylsiUv5GLLbAdov/nhPZB9uJgJGraqrpLQW6flUmf9Z6H0ZEDTpC5kqdgADHRR81PAXMRIu/pFED7FC0pR8ZN7rYCkELnaxsrvkiQQJ2uUVDN3yxi2VoywiAQCVZEQSdIl1ZML4mbpImQ41OPOyEjsjIRh3Zw65BH9+h5tmWg9t5Z4Xb3r5zcy+G1YgD0hJivxmU6DGIbajjncBnjWgejkmYaFzpzyNFnNJbd7GWLFS+tyvYXoNF/yuvLrJ2nOqcQe4+kwlSR70ay1VWQo1HoG18mvvdr7Q9dlhaMzAM44yCoAbmbXdmIq2hKUirp8js1p3wanxhsCNr/BSVmKMkdL2OH1vAAtSDWvkamc6poWI0wv8/PJMeaFncy8Kqc/raxjXz3K3nIh4vub+UyX3x4UgFUZVYt4cfWyWruxh+WvHXD0MrmM0NROuq+jzf61/X0VTPqyzepwrK14kqFsGipogUHRdujeGU/2VxihAWgnAAvv0J+m8DzLboe17w25IxwObBcMOln+Pllfw96sxZA++g4Ic7yX78o79sKWWYbTqpu7DMDrWfMHNoIoyzz8WrN9GLxyMLZLOKQ/cVWwhG/IdiMhVE8v+Op6w/YSWTbhRl2MRyp5b+PKSrDfOyg01X1jz9hAfs/IuujWYKOuxhWmZn9DHa4ky2KeA6E8CwaLdmfFFzEiglLT9USavouGLEyqee9U27NiZcpv5/AjACUxcay0WLN1y 9bTAkxJ9 4AMIsCATiGdOBgd39he+piLGUPk4fF8n1ff0ReTa6tOtUa7MAu9qosh4LMJTkpXn36dJpZBydaic2wYduheP7vGYC0HqmODsFcWEhctFF6qiAUZIXimCxX059jfOTDEdUDelm0xL5pN3oxRaX727H4H0j6u1iU7O1/z0d19HI4xP1RoFK0DQOwCNFjWd2uV3SUxuT5XSUDRQ+uXueE/btZVDG9PGOjxox1tULArbdli69upoOiA7CIK8mAHlKRJ669cMkPRUsW9jKw15q+g7pZI6mBvIGC0lM6CAaykRtLyI7XbqjThvJGOHjLnr0eklGnZgKOFy8a08TRtOrFcj240+Wqn/nv9ruJGuXFUHYiPbClenFyKO0qr6Sp5OsVTCPYdL+u215wJa4lWt/d4gX1UGavw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, May 21, 2025 at 01:32:38PM +0530, Bharata B Rao wrote: > > +static void task_check_pending_migrations(struct task_struct *curr) > +{ > + struct callback_head *work = &curr->numa_mig_work; > + > + if (work->next != work) > + return; > + > + if (time_after(jiffies, curr->numa_mig_interval) || > + (curr->migrate_count > NUMAB_BATCH_MIGRATION_THRESHOLD)) { > + curr->numa_mig_interval = jiffies + HZ; > + task_work_add(curr, work, TWA_RESUME); > + } > +} > + > /* > * Drive the periodic memory faults.. > */ > @@ -3610,6 +3672,8 @@ static void task_tick_numa(struct rq *rq, struct task_struct *curr) > if (!curr->mm || (curr->flags & (PF_EXITING | PF_KTHREAD)) || work->next != work) > return; > > + task_check_pending_migrations(curr); > + So I know this was discussed in the cover leter a bit and alluded to in the patch, but I want to add my 2cents from work on the unmapped page cache set. In that set, I chose to always schedule the task work on the next return to user-space, rather than defer to a tick like the current numa-balance code. This was for two concerns: 1) I didn't want to leave a potentially large number of isolated folios on a list that may not be reaped for an unknown period of time. I don't know the real limitations on the number of isolated folios, but given what we have here I think we can represent a mathematical worst case on the nubmer of stranded folios. If (N=1,000,000, and M=511) then we could have ~1.8TB of pages stranded on these lists - never to be migrated because it never hits the threshhold. In practice this won't happen to that extreme, but in practice it absolutely will happen for some chunk of tasks. So I chose to never leave kernel space with isolated folios on the task numa_mig_list. This discussion changes if the numa_mig_list is not on the task_struct and instead some per-cpu list routinely reaped by a kthread (kpromoted or whatever). 2) I was not confident I could measure the performance implications of the migrations directly when it was deferred. When would I even know it happened? The actual goal is to *not* know it happened, right? But now it might happen during a page fault, or any random syscall. This concerned me - so i just didn't defer. That was largely out of lack of confidence in my own understanding of the task_work system. So i think this, as presented, is a half-measure - and I don't think it's a good half-measure. I think we might need to go all the way to a set of per-cpu migration lists that a kernel work can pluck the head of on some interval. That would bound the number of isolated folios to the number of CPUs rather than the number of tasks. ~Gregory