From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BE5AC4361B for ; Fri, 11 Dec 2020 21:53:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9B93A23B31 for ; Fri, 11 Dec 2020 21:53:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9B93A23B31 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0FDDD6B0036; Fri, 11 Dec 2020 16:53:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 08C136B005C; Fri, 11 Dec 2020 16:53:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E22356B005D; Fri, 11 Dec 2020 16:53:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0061.hostedemail.com [216.40.44.61]) by kanga.kvack.org (Postfix) with ESMTP id C66846B0036 for ; Fri, 11 Dec 2020 16:53:08 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 85BEA180CE6A7 for ; Fri, 11 Dec 2020 21:53:08 +0000 (UTC) X-FDA: 77582352456.01.curve17_551056c27404 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id 5EE481004535D for ; Fri, 11 Dec 2020 21:53:08 +0000 (UTC) X-HE-Tag: curve17_551056c27404 X-Filterd-Recvd-Size: 9506 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Fri, 11 Dec 2020 21:53:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1607723587; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SJFhYBa9CqByLbRNXeFKOepLeMxiLoSh+EYNSRNu9ZE=; b=OlX1me9AruDDgHbjWq6wa3Ir8vd2PY9sDGesIA2Mkpvt9rmm7JHqNSTNsiJk2MBd3DqftF NEY1Eua3ZNeuyqhfYa8eEmxsmPO5hyuPmgG+/er7KKsXhnT5BkjZ3/Y8GtaTQYZbyQO8Su EQCq+eEMJr7JtKYYZClNxhnLFQ0spwI= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-335-ZBSq7PBOOIOPif06b6k3gA-1; Fri, 11 Dec 2020 16:53:03 -0500 X-MC-Unique: ZBSq7PBOOIOPif06b6k3gA-1 Received: by mail-wm1-f71.google.com with SMTP id z12so3794045wmf.9 for ; Fri, 11 Dec 2020 13:53:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=2ofz0LLzLgTKfndVa7Y4jwNU81NgrxPa3g7jb7sTuuQ=; b=Xs1TMOxL1FjsVvv1yJtstVSLmZYkZDoFGwIHdh7059CrIvXqHqM+0uK7YTXnaQN+iH tb683k7UWRyuKjDfZmAhOuhRk741tshzXI4gkM7D7axViX9heJ+4C5s+L4/PLQB/IjXx hHksEcBV40tt/85M011ZBQGtya+JpfLa20VPDS/8on72zl5EnIcv/4QG6RaE0xufHjK3 hr+R81Rn0s7iY/5Ijm987ugoqh5ZxB+cbH8vcj/FNlh408Uc4PF8I6CwMSfseQqBQl45 Ex7T762luBU0jCrTeuH8cOyxybP0i5Slty7MBwrH9H1TOjTpcpt/6Jc+k48qBIsHmSVk DMYg== X-Gm-Message-State: AOAM533e9B8HYPHE3WOTTg3D27vZSJzoOrJm2ZrDv0itGlURO9UgpLlj IEIwUSY+HnJFx+0N/6VcxfZO5GoTJ5udjq04gPUxFiCFxN0KttRq0h1h07aTLWtNxF6uBX8hzHV 6AVqnaLiTdr4= X-Received: by 2002:a1c:4156:: with SMTP id o83mr15520326wma.178.1607723582232; Fri, 11 Dec 2020 13:53:02 -0800 (PST) X-Google-Smtp-Source: ABdhPJzuOyDw9r2Sh8TcUavdvb04N1iVX5oCiTO3s1ta5lfNv7sHb8KHPcSneccODYjUawYaEkanZQ== X-Received: by 2002:a1c:4156:: with SMTP id o83mr15520295wma.178.1607723582002; Fri, 11 Dec 2020 13:53:02 -0800 (PST) Received: from [192.168.3.114] (p4ff23c7c.dip0.t-ipconnect.de. [79.242.60.124]) by smtp.gmail.com with ESMTPSA id z3sm17572565wrn.59.2020.12.11.13.53.01 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 11 Dec 2020 13:53:01 -0800 (PST) From: David Hildenbrand Mime-Version: 1.0 (1.0) Subject: Re: [PATCH v3 5/6] mm/gup: migrate pinned pages out of movable zone Date: Fri, 11 Dec 2020 22:53:00 +0100 Message-Id: <447A41F3-EB94-4DA4-8B98-038B127774A5@redhat.com> References: Cc: David Hildenbrand , Jason Gunthorpe , LKML , linux-mm , Andrew Morton , Vlastimil Babka , Michal Hocko , Oscar Salvador , Dan Williams , Sasha Levin , Tyler Hicks , Joonsoo Kim , mike.kravetz@oracle.com, Steven Rostedt , Ingo Molnar , Peter Zijlstra , Mel Gorman , Matthew Wilcox , David Rientjes , John Hubbard , Linux Doc Mailing List In-Reply-To: To: Pavel Tatashin X-Mailer: iPhone Mail (18B92) Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > Am 11.12.2020 um 22:36 schrieb Pavel Tatashin = : >=20 > =EF=BB=BFOn Fri, Dec 11, 2020 at 4:29 PM David Hildenbrand wrote: >>=20 >>=20 >>>> Am 11.12.2020 um 22:09 schrieb Pavel Tatashin : >>>=20 >>> =EF=BB=BFOn Fri, Dec 11, 2020 at 3:46 PM Jason Gunthorpe = wrote: >>>>=20 >>>>> On Fri, Dec 11, 2020 at 03:40:57PM -0500, Pavel Tatashin wrote: >>>>> On Fri, Dec 11, 2020 at 3:23 PM Jason Gunthorpe wrote: >>>>>>=20 >>>>>> On Fri, Dec 11, 2020 at 03:21:39PM -0500, Pavel Tatashin wrote: >>>>>>> @@ -1593,7 +1592,7 @@ static long check_and_migrate_cma_pages(struc= t mm_struct *mm, >>>>>>> } >>>>>>>=20 >>>>>>> if (!isolate_lru_page(head)) { >>>>>>> - list_add_tail(&head->lru, &cm= a_page_list); >>>>>>> + list_add_tail(&head->lru, &mo= vable_page_list); >>>>>>> mod_node_page_state(page_pgdat(= head), >>>>>>> NR_ISOLATED= _ANON + >>>>>>> page_is_fil= e_lru(head), >>>>>>> @@ -1605,7 +1604,7 @@ static long check_and_migrate_cma_pages(struc= t mm_struct *mm, >>>>>>> i +=3D step; >>>>>>> } >>>>>>>=20 >>>>>>> - if (!list_empty(&cma_page_list)) { >>>>>>> + if (!list_empty(&movable_page_list)) { >>>>>>=20 >>>>>> You didn't answer my earlier question, is it OK that ZONE_MOVABLE >>>>>> pages leak out here if ioslate_lru_page() fails but the >>>>>> moval_page_list is empty? >>>>>>=20 >>>>>> I think the answer is no, right? >>>>> In my opinion it is OK. We are doing our best to not pin movable >>>>> pages, but if isolate_lru_page() fails because pages are currently >>>>> locked by someone else, we will end up long-term pinning them. >>>>> See comment in this patch: >>>>> + * 1. Pinned pages: (long-term) pinning of movable pages is a= voided >>>>> + * when pages are pinned and faulted, but it is still poss= ible that >>>>> + * address space already has pages in ZONE_MOVABLE at the = time when >>>>> + * pages are pinned (i.e. user has touches that memory bef= ore >>>>> + * pinning). In such case we try to migrate them to a diff= erent zone, >>>>> + * but if migration fails the pages can still end-up pinne= d in >>>>> + * ZONE_MOVABLE. In such case, memory offlining might retr= y a long >>>>> + * time and will only succeed once user application unpins= pages. >>>>=20 >>>> It is not "retry a long time" it is "might never complete" because >>>> userspace will hold the DMA pin indefinitely. >>>>=20 >>>> Confused what the point of all this is then ?? >>>>=20 >>>> I thought to goal here is to make memory unplug reliable, if you leave >>>> a hole like this then any hostile userspace can block it forever. >>>=20 >>> You are right, I used a wording from the previous comment, and it >>> should be made clear that pin may be forever. Without these patches it >>> is guaranteed that hot-remove will fail if there are pinned pages as >>> ZONE_MOVABLE is actually the first to be searched. Now, it will fail >>> only due to exceptions listed in ZONE_MOVABLE comment: >>>=20 >>> 1. pin + migration/isolation failure >>=20 >> Not sure what that really means. We have short-term pinnings (although w= e might have a better term for =E2=80=9Epinning=E2=80=9C here) for example,= when a process dies (IIRC). There is a period where pages cannot get migra= ted and offlining code has to retry (which might take a while). This still = applies after your change - are you referring to that? >>=20 >>> 2. memblock allocation due to limited amount of space for kernelcore >>> 3. memory holes >>> 4. hwpoison >>> 5. Unmovable PG_offline pages (? need to study why this is a scenario). >>=20 >> Virtio-mem is the primary user in this context. >>=20 >>> Do you think we should unconditionally unpin pages, and return error >>> when isolation/migration fails? >>=20 >> I=E2=80=98m not sure what you mean here. Who=E2=80=99s supposed to unpin= which pages? >=20 > Hi David, >=20 > When check_and_migrate_movable_pages() is called, the pages are > already pinned. If some of those pages are in movable zone, and we > fail to migrate or isolate them what should we do: proceed, and keep > it as exception of when movable zone can actually have pinned pages or > unpin all pages in the array, and return an error, or unpin only pages > in movable zone, and return an error? >=20 I guess revert what we did (unpin) and return an error. The interesting que= stion is what can make migration/isolation fail a) out of memory: smells like a zone setup issue. Failures are acceptable I= guess. b) short term pinnings: process dying - not relevant I guess. Other cases? = (Fork?) c) ? Once we clarified that, we actually know how likely it will be to return an= error (and making vfio pinnings fail etc). > Pasha