From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76AF2C4361B for ; Fri, 18 Dec 2020 12:44:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DB9F523A5B for ; Fri, 18 Dec 2020 12:44:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DB9F523A5B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=soleen.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 347F86B006C; Fri, 18 Dec 2020 07:44:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F9BD6B0070; Fri, 18 Dec 2020 07:44:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E81C6B0071; Fri, 18 Dec 2020 07:44:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0078.hostedemail.com [216.40.44.78]) by kanga.kvack.org (Postfix) with ESMTP id 0570C6B006C for ; Fri, 18 Dec 2020 07:44:54 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id B75EB181AC9CC for ; Fri, 18 Dec 2020 12:44:53 +0000 (UTC) X-FDA: 77606372466.21.pie11_1616af72743d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 7892218179033 for ; Fri, 18 Dec 2020 12:44:00 +0000 (UTC) X-HE-Tag: pie11_1616af72743d X-Filterd-Recvd-Size: 5494 Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Fri, 18 Dec 2020 12:43:53 +0000 (UTC) Received: by mail-ej1-f54.google.com with SMTP id 6so3015395ejz.5 for ; Fri, 18 Dec 2020 04:43:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=VP62gLy4L6lfMGIRwtzfjFvE+h27PxSa+U0ikjWYO+Y=; b=X0hCklMnpRKCi/ALhX9mvgOwcS8w3p+GTPamqE39ttdKNscia4P58KD++FD2plVHEo A2UZwBioUoldfE+g1gnuZazRSXRtwyTvuOxtpxexB4ny4bAneoWMowNMk9+NyYIUf2tE fpmcwGa+RSYTcM6j2JIqqfqaD4QUyoutw2JvGnjh8ZNCVnoEaeXMnhl6/rs6njNIjw2u EnZXSgbc0WUzbI2WF28Za2/SO0iaCFCLXz58pWOZi2/vvGDTdfQOlTVEBtshkQw+6Pxj sj7yDpUPFFgfKQa11E681HdSZboZCp+5rRtgc3lOCDMxrce9g0AJLJRq1ieG7Dd3HOBR Wphw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=VP62gLy4L6lfMGIRwtzfjFvE+h27PxSa+U0ikjWYO+Y=; b=dEsuUUVUKJ4DSJ2hYEcO4zogNIKut6egRblPeZQek0l5ofBlvNhK2zKih/UQ/Y40oE 6cUQXh8TDxHTw//3/+USbEM/x1mS/kpYxSQAO+MdCTisjxda7nFzoPE5QFbOT++4gJgY dtQlSfJLbaRdo5nC2GssymWRbI61dJ70l/cS21pzCW2svvgZJemHrVnEuhRUvvvO5zxM 9CR0FpYfSjzWcPE9LJnM7/2MSe9R+bmnjGGMpoMb2XwGUBRsdrNvQtlk+lI5IseT+Qqw NK+XdTfUoPvR3/0fay6ACRiR0/xz5HsUpUVTdnAeyE9/q8JvkQFGG7kES7OtLHj5z22K C/sw== X-Gm-Message-State: AOAM532+VVlpG+A3EiaaORMK4ive6w82H61L6vsOfMOVa2IIOWr/CvI4 lT5t3DT7evgzqaCuojHeMCrsUvnS8uJxUM4z5Rp+yw== X-Google-Smtp-Source: ABdhPJxxe+sAeq95UrfMrz6gv/whvetHEI4cy28Afan/x1ZYOwwIJMl5u7ZOqiPw7wthIMpPxtIbVN5HmHYvjWudbsA= X-Received: by 2002:a17:906:7d98:: with SMTP id v24mr3816705ejo.129.1608295432506; Fri, 18 Dec 2020 04:43:52 -0800 (PST) MIME-Version: 1.0 References: <20201217185243.3288048-1-pasha.tatashin@soleen.com> <20201217185243.3288048-9-pasha.tatashin@soleen.com> <20201218104655.GW32193@dhcp22.suse.cz> In-Reply-To: <20201218104655.GW32193@dhcp22.suse.cz> From: Pavel Tatashin Date: Fri, 18 Dec 2020 07:43:15 -0500 Message-ID: Subject: Re: [PATCH v4 08/10] mm/gup: limit number of gup migration failures, honor failures To: Michal Hocko Cc: LKML , linux-mm , Andrew Morton , Vlastimil Babka , David Hildenbrand , Oscar Salvador , Dan Williams , Sasha Levin , Tyler Hicks , Joonsoo Kim , mike.kravetz@oracle.com, Steven Rostedt , Ingo Molnar , Jason Gunthorpe , Peter Zijlstra , Mel Gorman , Matthew Wilcox , David Rientjes , John Hubbard , Linux Doc Mailing List , Ira Weiny , linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Dec 18, 2020 at 5:46 AM Michal Hocko wrote: > > On Thu 17-12-20 13:52:41, Pavel Tatashin wrote: > [...] > > +#define PINNABLE_MIGRATE_MAX 10 > > +#define PINNABLE_ISOLATE_MAX 100 > > Why would we need to limit the isolation retries. Those should always be > temporary failure unless I am missing something. Actually, during development, I was retrying isolate errors infinitely, but during testing found a hung where when FOLL_TOUCH without FOLL_WRITE is passed (fault in kernel without write flag), the zero page is faulted. The isolation of the zero page was failing every time, therefore the process was hanging. Since then, I fixed this problem by adding FOLL_WRITE unconditionally to FOLL_LONGTERM, but I was worried about other possible bugs that would cause hangs, so decided to limit isolation errors. If you think it its not necessary, I can unlimit isolate retires. > I am not sure about the > PINNABLE_MIGRATE_MAX either. Why do we want to limit that? migrate_pages > already implements its retry logic why do you want to count retries on > top of that? I do agree that the existing logic is suboptimal because True, but again, just recently, I worked on a race bug where pages can end up in per-cpu list after lru_add_drain_all() but before isolation, so I think retry is necessary. > the migration failure might be ephemeral or permanent but that should be > IMHO addressed at migrate_pages (resp. unmap_and_move) and simply report > failures that are permanent - e.g. any potential pre-existing long term > pin - if that is possible at all. If not what would cause permanent > migration failure? OOM? Yes, OOM is the main cause for migration failures. And also a few cases described in movable zone comment, where it is possible during boot some pages can be allocated by memblock in movable zone due to lack of memory resources (even if those resources were added later), hardware page poisoning is another rare example. > -- > Michal Hocko > SUSE Labs