From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5513C4725D for ; Fri, 19 Jan 2024 14:09:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 176D76B007D; Fri, 19 Jan 2024 09:09:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 127E36B0080; Fri, 19 Jan 2024 09:09:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F31896B0082; Fri, 19 Jan 2024 09:09:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E36536B007D for ; Fri, 19 Jan 2024 09:09:01 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B07F7160D19 for ; Fri, 19 Jan 2024 14:09:01 +0000 (UTC) X-FDA: 81696242082.05.97DFEB8 Received: from mail-yw1-f178.google.com (mail-yw1-f178.google.com [209.85.128.178]) by imf28.hostedemail.com (Postfix) with ESMTP id DF12CC000A for ; Fri, 19 Jan 2024 14:08:59 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Blt1o/2R"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf28.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705673339; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dO1H1He8j3LLWC6d8RYibRBfsieHAb7pDs8qjCnjqxk=; b=gA11fwWw369cOcBo3EKk2ZtIy6WioYpwBWzK8p14NuVqGqgi0P9ypo3CqS3vsYywf/k670 1YEmG/mvmHsHSfNkJ9rJARCuSOw8ZREvL2I9Ff7ugoh+wboci2rqTg5x6hf0m87k6Ln/S9 M4dmigpwINPP/Pllh47aXEBtNNLFduw= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Blt1o/2R"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf28.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.128.178 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705673339; a=rsa-sha256; cv=none; b=zR33RaLGl9zfJ4n/Gn4h1mjL46m0v7M+Qmi/9UOuF/46FXP6nxDiJ5DKhlEsfIR96LBtD8 4J8czgkoVA21+X3IScyx+DzZsOCpRE/wbwfaaD+F/VJ+G1/Hdwet5wVgckVhz7Aj95yH+O DvH2UJUKsPlJ1wW5ib8QazDet/HKueo= Received: by mail-yw1-f178.google.com with SMTP id 00721157ae682-5ec7a5a4b34so8252907b3.0 for ; Fri, 19 Jan 2024 06:08:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705673339; x=1706278139; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=dO1H1He8j3LLWC6d8RYibRBfsieHAb7pDs8qjCnjqxk=; b=Blt1o/2RGyAaPzPyYUv/q+CX4Ms6Cg93wkzVBwe2VjFF0gSE8mMEaYuXeNWeQouWaQ LbmXqgO3MdtAkACFmQ8PCF6Snt6JtMHHAapsgUVqwbmpYy88ystJ++PVC3Uo7RBu6SWf KvrluY5s85CBMoTxRE60eImPDzJXp/33dSyjrSRyWDIt4IN4IXG9OhDUwFYt326sYRX3 yTi8a+rlGiCvMM4zggNVVa5t4G8O+CUEYL+cSfM2GqOocMeiGApCOBb9csBHs/cn0LlL s9uniQypAJzsYHiPIDwfC5UzhQWRd63KJ8zoz07Pr4BvJfyEFLsHn8dsrbxnS8difKvD vHdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705673339; x=1706278139; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dO1H1He8j3LLWC6d8RYibRBfsieHAb7pDs8qjCnjqxk=; b=oavMDYd4uKvEf452Iq7EW8hB7lg3FxshyUDVyMtbFSxWwxHet3nhW9yz7SJpiMTkC6 AY6qFyFG8TJMH4bQVJfCuWSYLzlcpqcumUI5meq+jC+NrWntUeV99bF7fntn6AEd7zqj vyvw1EatpVn4nSftHoy63mH8qd6eENB69/VsjMPw9BdR4/eVSR5yebaqNgn0BgPUmseA Xp/D8b71xXfjpV90/WtykIv4RGutWYU3xmwa0Q73sz/M9bij+voxSfCn8Yx2JFEEhinw HcQsICqy92uWEV5vq/ZU4Vvr3X6DW8fw5ZdFqnW0eeHMgA+eNWSEEvFR+V16EmXjSUY/ 8hIQ== X-Gm-Message-State: AOJu0YyXsmtRMCUKvBsAzd/7KqNMNKYLVf0bQc7xhII+2ZFBo8fiKvZz HQ5uHiaLpMU8NQcMk3bAh09/uVrK0Q0JS9JiUZ4LZGDtNkkX5YywnRm46Bra9JbZgC5TOo2Ci3Y 5SOEULIqPACWwL2sbcWMDjsdc8tQ= X-Google-Smtp-Source: AGHT+IF5I2Bg881/byoL0ER6lbTrXCYVs7pQ3k21Pk8jy/PB5y1JEKyTYzu7FhGJr7RIYerr/dH0tzrcumpzkRN6FQk= X-Received: by 2002:a81:47d6:0:b0:5ff:420a:606 with SMTP id u205-20020a8147d6000000b005ff420a0606mr2575113ywa.75.1705673338698; Fri, 19 Jan 2024 06:08:58 -0800 (PST) MIME-Version: 1.0 References: <20240118120347.61817-1-ioworker0@gmail.com> In-Reply-To: From: Lance Yang Date: Fri, 19 Jan 2024 22:08:44 +0800 Message-ID: Subject: Re: [PATCH v2 1/1] mm/madvise: add MADV_F_COLLAPSE_LIGHT to process_madvise() To: Michal Hocko Cc: akpm@linux-foundation.org, zokeefe@google.com, david@redhat.com, songmuchun@bytedance.com, shy828301@gmail.com, peterx@redhat.com, mknyszek@google.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: DF12CC000A X-Stat-Signature: z3b6cjobg6k7c7g4a6f7yskm3cg1q8yw X-Rspam-User: X-HE-Tag: 1705673339-59140 X-HE-Meta: U2FsdGVkX1/nejTcBgA18gbN4iUL4Nia9VuVPoaVK5j9BTuNwYKErPUQGUhmdWsffHl2K81PE1eqKaBlvdxCvIJdkuq+W1YkEsdNRFRzPyoZcxwUaU7T3K7yzJCYKrHN1D2zP6ODJke7arERQ87FjKyydNln0kggikW6lTco3Q+V1KhjD8fxBhJG/zFtVZw/yYglv3MLyOw0/je1ZpZnkqULih40YVJ25dGxct4ExwooZKmJIZKqzdkh14rAwFEhdf3+A3OuktL0tUGT2eIbpNm1EX/UtKuQ7negUNu8H5iJF8zQAWWBRHErETOUaff+dIu8U8oiFzLYfZIXVTDQvYRtcRNF9P9PqtXURpm1J6E2k7a8XHG+dm2p0qRbFnbjMBTL2YB/RNOriBQdOYLWdGk7Gb7t+n4ktMp3NYBLSmbdPb5xBqwkcZuQQehWOT272rNhS/Wk4LvLFMImo63AgLc+79Qs1YJ+AWgfA/tWCjfNDZAGzzJ3tcraEHQo6waZIdLiPwA5TOIPsZohsopAZpetRlgf/LIXliVUrXhPGJSGlZX7LZHtRLqf9PVwvs0P6wzbwUvHur9jKdTQgGFwU9yZUd2vhFkpUWIbMngiNbSz5KvSWfW6/OnKn85lGYqfA0lK+61Ee6EW2dTXTyKCsALDrqwELCSe27NoKbcXPlcumHPPlML27i6xqp6y53CDHYySsEs7+P91wbbVSSKORrItpnjeNJIwfOXj0AZResPAOq4FbszIenVfLYdSuUqeD7Gz+WjDAOeQAbf2CrvmnLNcATAeB/SyyzKUcizEnxx1XIIkU5j/3D54nQfaUX/q6E+Uo677yAoSWrVL0etolqO3M0sZV+AliwZZKfrtAB7ejyLulreAIwYXU5vygt0r7dtKR3GTsXml011LmsdcTqw9xhcMgNpCTnhjajoRSyEAqvvM1X+BDWKbzL3GpIRRbVy/m5w6vMXgi9jcVBM wUcTTe8f qvgTadbEsHQlWmcPzeqp4DCqMYu6gHxJNA2Na/Up7V5c1aP8P5pun4BZqPgzRy26By0wFMXzvK/pl+fnnOaPS4dpPKYu7I+GilduRisoHMn7TdinB5YvZA7K6VmyFWQQlfVedH2GLX7KWKH03NIkQ6yglk3j1OAi3/TJfBsCZk1z3I4ERT40qGWzBqCoOJJEUH85DbrDvNxQLUtBalIgIkYYhIezyCWj0zJB1NODh/NJ0oGNAbcGGpRee60ptlMLodxWh1g8PIWCrQSQQZsqccxD0qAOQVkmaIzTLcBEQIaMzRqp/e05fvE6oPphst3+aCwE27pXlbneBk5ED7whHZhL9Ij912EpzXJd6PPpPBRucCqzXAgK0xUBDquJwEnhcarvnV+M/xaMNscB+wnOnmC639j3jJEKPCYKlXBA9mwvU5QIuV/eLl0rzN9wMVy3CxErvHsy1stCNAZeS6Hg3nrBivURreWwgsQrUk51C0yimjtc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jan 19, 2024 at 8:51=E2=80=AFPM Michal Hocko wrot= e: > > On Fri 19-01-24 10:03:05, Lance Yang wrote: > > Hey Michal, > > > > Thanks for taking the time to review! > > > > On Thu, Jan 18, 2024 at 9:40=E2=80=AFPM Michal Hocko = wrote: > > > > > > On Thu 18-01-24 20:03:46, Lance Yang wrote: > > > [...] > > > > > > before we discuss the semantic, let's focus on the usecase. > > > > > > > Use Cases > > > > > > > > An immediate user of this new functionality is the Go runtime heap = allocator > > > > that manages memory in hugepage-sized chunks. In the past, whether = it was a > > > > newly allocated chunk through mmap() or a reused chunk released by > > > > madvise(MADV_DONTNEED), the allocator attempted to eagerly back mem= ory with > > > > huge pages using madvise(MADV_HUGEPAGE)[2] and madvise(MADV_COLLAPS= E)[3] > > > > respectively. However, both approaches resulted in performance issu= es; for > > > > both scenarios, there could be entries into direct reclaim and/or c= ompaction, > > > > leading to unpredictable stalls[4]. Now, the allocator can confiden= tly use > > > > process_madvise(MADV_F_COLLAPSE_LIGHT) to attempt the allocation of= huge pages. > > > > > > IIUC the primary reason is the cost of the huge page allocation which > > > can be really high if the memory is heavily fragmented and it is call= ed > > > synchronously from the process directly, correct? Can that be worked > > > > Yes, that's correct. > > > > > around by process_madvise and performing the operation from a differe= nt > > > context? Are there any other reasons to have a different mode? > > > > In latency-sensitive scenarios, some applications aim to enhance perfor= mance > > by utilizing huge pages as much as possible. At the same time, in case = of > > allocation failure, they prefer a quick return without triggering direc= t memory > > reclamation and compaction. > > Could you elaborate some more on why? Previously, the Go runtime attempted to marks all new memory as MADV_HUGEPA= GE on Linux and manages its hugepage eligibility status. Unfortunately, the default THP behavior on most Linux distros is that MADV_HUGEPAGE blocks while the kerne= l eagerly reclaims and compacts memory to allocate a hugepage. This direct reclaim and compaction is unbounded, and may result in signific= ant application thread stalls. In really bad cases, this can exceed 100s of ms or even seconds. The overall strategy of trying to keep hugepages for the heap unbroken however is sound. So, the Go runtime uses MADV_COLLAPSE as an alternative. See https://github.com/golang/go/commit/9f9bb26880388c5bead158e9eca3be4b3a9= bd2af Later, a Google production service experienced a performance regression with the Go runtime's use of MADV_COLLAPSE. For now, the Go runtime has rolled back the usage of MADV_COLLAPSE. See https://github.com/golang/go/issues/63334 If there were a more relaxed (opportunistic) MADV_COLLAPSE, it would avoid direct reclaim and/or compaction and quickly fail on allocation errors. This could be beneficial for similar use cases. BR, Lance > > > > I mean I can think of a more relaxed (opportunistic) MADV_COLLAPSE - > > > e.g. non blocking one to make sure that the caller doesn't really blo= ck > > > on resource contention (be it locks or memory availability) because t= hat > > > matches our non-blocking interface in other areas but having a LIGHT > > > operation sounds really vague and the exact semantic would be > > > implementation specific and might change over time. Non-blocking has = a > > > clear semantic but it is not really clear whether that is what you > > > really need/want. > > > > Could you provide me with some suggestions regarding the naming of a > > more relaxed (opportunistic) MADV_COLLAPSE? > > Naming is not all that important at this stage (it could be > MADV_COLLAPSE_NOBLOCK for example). The primary question is whether > non-blocking in general is the desired behavior or the implementation > should try but not too hard. > > -- > Michal Hocko > SUSE Labs