[RFC] Test catalog template

workflows.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] Test catalog template
@ 2024-10-14 20:32 Donald Zickus
  2024-10-15 16:01 ` [Automated-testing] " Bird, Tim
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Donald Zickus @ 2024-10-14 20:32 UTC (permalink / raw)
  To: workflows, automated-testing, linux-kselftest, kernelci
  Cc: Nikolai Kondrashov, Gustavo Padovan, kernelci-members, laura.nao

Hi,

At Linux Plumbers, a few dozen of us gathered together to discuss how
to expose what tests subsystem maintainers would like to run for every
patch submitted or when CI runs tests.  We agreed on a mock up of a
yaml template to start gathering info.  The yaml file could be
temporarily stored on kernelci.org until a more permanent home could
be found.  Attached is a template to start the conversation.

Longer story.

The current problem is CI systems are not unanimous about what tests
they run on submitted patches or git branches.  This makes it
difficult to figure out why a test failed or how to reproduce.
Further, it isn't always clear what tests a normal contributor should
run before posting patches.

It has been long communicated that the tests LTP, xfstest and/or
kselftests should be the tests  to run.  However, not all maintainers
use those tests for their subsystems.  I am hoping to either capture
those tests or find ways to convince them to add their tests to the
preferred locations.

The goal is for a given subsystem (defined in MAINTAINERS), define a
set of tests that should be run for any contributions to that
subsystem.  The hope is the collective CI results can be triaged
collectively (because they are related) and even have the numerous
flakes waived collectively  (same reason) improving the ability to
find and debug new test failures.  Because the tests and process are
known, having a human help debug any failures becomes easier.

The plan is to put together a minimal yaml template that gets us going
(even if it is not optimized yet) and aim for about a dozen or so
subsystems.  At that point we should have enough feedback to promote
this more seriously and talk optimizations.

Feedback encouraged.

Cheers,
Don

---
# List of tests by subsystem
#
# Tests should adhere to KTAP definitions for results
#
# Description of section entries
#
#  maintainer:    test maintainer - name <email>
#  list:                mailing list for discussion
#  version:         stable version of the test
#  dependency: necessary distro package for testing
#  test:
#    path:            internal git path or url to fetch from
#    cmd:            command to run; ability to run locally
#    param:         additional param necessary to run test
#  hardware:      hardware necessary for validation
#
# Subsystems (alphabetical)

KUNIT TEST:
  maintainer:
    - name: name1
      email: email1
    - name: name2
      email: email2
  list:
  version:
  dependency:
    - dep1
    - dep2
  test:
    - path: tools/testing/kunit
      cmd:
      param:
    - path:
      cmd:
      param:
  hardware: none

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [Automated-testing] [RFC] Test catalog template
  2024-10-14 20:32 [RFC] Test catalog template Donald Zickus
@ 2024-10-15 16:01 ` Bird, Tim
  2024-10-16 13:10   ` Cyril Hrubis
  2024-10-16 18:00   ` Donald Zickus
  2024-10-17 12:31 ` Minas Hambardzumyan
  2024-10-18  7:21 ` David Gow
  2 siblings, 2 replies; 17+ messages in thread
From: Bird, Tim @ 2024-10-15 16:01 UTC (permalink / raw)
  To: Don Zickus, workflows, automated-testing, linux-kselftest, kernelci
  Cc: Nikolai Kondrashov, Gustavo Padovan, kernelci-members, laura.nao

> -----Original Message-----
> From: automated-testing@lists.yoctoproject.org <automated-testing@lists.yoctoproject.org> On Behalf Of Don Zickus
> Hi,
> 
> At Linux Plumbers, a few dozen of us gathered together to discuss how
> to expose what tests subsystem maintainers would like to run for every
> patch submitted or when CI runs tests.  We agreed on a mock up of a
> yaml template to start gathering info.  The yaml file could be
> temporarily stored on kernelci.org until a more permanent home could
> be found.  Attached is a template to start the conversation.
>
Don,

I'm interested in this initiative.  Is discussion going to be on a kernel mailing
list, or on this e-mail, or somewhere else?

See a few comments below.
 
> Longer story.
> 
> The current problem is CI systems are not unanimous about what tests
> they run on submitted patches or git branches.  This makes it
> difficult to figure out why a test failed or how to reproduce.
> Further, it isn't always clear what tests a normal contributor should
> run before posting patches.
> 
> It has been long communicated that the tests LTP, xfstest and/or
> kselftests should be the tests  to run. 
Just saying "LTP" is not granular enough.  LTP has hundreds of individual
test programs, and it would be useful to specify the individual tests
from LTP that should be run per sub-system.

I was particularly intrigued by the presentation at Plumbers about
test coverage.  It would be nice to have data (or easily replicable
methods) for determining the code coverage of a test or set of
tests, to indicate what parts of the kernel are being missed
and help drive new test development.

> However, not all maintainers
> use those tests for their subsystems.  I am hoping to either capture
> those tests or find ways to convince them to add their tests to the
> preferred locations.
> 
> The goal is for a given subsystem (defined in MAINTAINERS), define a
> set of tests that should be run for any contributions to that
> subsystem.  The hope is the collective CI results can be triaged
> collectively (because they are related) and even have the numerous
> flakes waived collectively  (same reason) improving the ability to
> find and debug new test failures.  Because the tests and process are
> known, having a human help debug any failures becomes easier.
> 
> The plan is to put together a minimal yaml template that gets us going
> (even if it is not optimized yet) and aim for about a dozen or so
> subsystems.  At that point we should have enough feedback to promote
> this more seriously and talk optimizations.

Sounds like a good place to start.  Do we have some candidate sub-systems
in mind?  Has anyone volunteered to lead the way?

> 
> Feedback encouraged.
> 
> Cheers,
> Don
> 
> ---
> # List of tests by subsystem
> #
> # Tests should adhere to KTAP definitions for results
> #
> # Description of section entries
> #
> #  maintainer:    test maintainer - name <email>
> #  list:                mailing list for discussion
> #  version:         stable version of the test
> #  dependency: necessary distro package for testing
> #  test:
> #    path:            internal git path or url to fetch from
> #    cmd:            command to run; ability to run locally
> #    param:         additional param necessary to run test
> #  hardware:      hardware necessary for validation

Is this something new in MAINTAINERS, or is it a separate file?

> #
> # Subsystems (alphabetical)
> 
> KUNIT TEST:
>   maintainer:
>     - name: name1
>       email: email1
>     - name: name2
>       email: email2
>   list:
>   version:
>   dependency:
>     - dep1
>     - dep2
>   test:
>     - path: tools/testing/kunit
>       cmd:
>       param:
>     - path:
>       cmd:
>       param:
>   hardware: none

Looks OK so far - it'd be nice to have a few concrete examples.
 -- Tim


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Automated-testing] [RFC] Test catalog template
  2024-10-15 16:01 ` [Automated-testing] " Bird, Tim
@ 2024-10-16 13:10   ` Cyril Hrubis
  2024-10-16 18:02     ` Donald Zickus
  2024-10-16 18:00   ` Donald Zickus
  1 sibling, 1 reply; 17+ messages in thread
From: Cyril Hrubis @ 2024-10-16 13:10 UTC (permalink / raw)
  To: Tim Bird
  Cc: Don Zickus, workflows, automated-testing, linux-kselftest,
	kernelci, Nikolai Kondrashov, Gustavo Padovan, kernelci-members,
	laura.nao

Hi!
> Just saying "LTP" is not granular enough.  LTP has hundreds of individual
> test programs, and it would be useful to specify the individual tests
> from LTP that should be run per sub-system.

A few thousand tests to be more precise, and also the content tend to
change between releases, be it test additions or removal and I do not
think this level of changes is somehing that makes sense to be tracked
in such database.

It may be better to have more generic description of LTP subsets, there
are a few obvious e.g. "SysV IPC" or "Timers", and have the LTP
testrunner map that to actual testcases. The hard task here is to figure
out which groups would be useful and keep the set reasonably small.

I can move this forward in LTP reasonably quickly we get small list of
useful groups from kernel develpers.

-- 
Cyril Hrubis
chrubis@suse.cz

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Automated-testing] [RFC] Test catalog template
  2024-10-16 13:10   ` Cyril Hrubis
@ 2024-10-16 18:02     ` Donald Zickus
  2024-10-17 11:01       ` Cyril Hrubis
  0 siblings, 1 reply; 17+ messages in thread
From: Donald Zickus @ 2024-10-16 18:02 UTC (permalink / raw)
  To: Cyril Hrubis
  Cc: Tim Bird, workflows, automated-testing, linux-kselftest,
	kernelci, Nikolai Kondrashov, Gustavo Padovan, kernelci-members,
	laura.nao

Hi Cyril,

On Wed, Oct 16, 2024 at 9:11 AM Cyril Hrubis <chrubis@suse.cz> wrote:
>
> Hi!
> > Just saying "LTP" is not granular enough.  LTP has hundreds of individual
> > test programs, and it would be useful to specify the individual tests
> > from LTP that should be run per sub-system.
>
> A few thousand tests to be more precise, and also the content tend to
> change between releases, be it test additions or removal and I do not
> think this level of changes is somehing that makes sense to be tracked
> in such database.
>
> It may be better to have more generic description of LTP subsets, there
> are a few obvious e.g. "SysV IPC" or "Timers", and have the LTP
> testrunner map that to actual testcases. The hard task here is to figure
> out which groups would be useful and keep the set reasonably small.
>
> I can move this forward in LTP reasonably quickly we get small list of
> useful groups from kernel develpers.

Thanks!  The thought was if we wanted to encourage contributors to run
these tests before submitting, does running the whole LTP testsuite
make sense or like you said a targeted set would be much better?

Cheers,
Don

>
> --
> Cyril Hrubis
> chrubis@suse.cz
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Automated-testing] [RFC] Test catalog template
  2024-10-16 18:02     ` Donald Zickus
@ 2024-10-17 11:01       ` Cyril Hrubis
  0 siblings, 0 replies; 17+ messages in thread
From: Cyril Hrubis @ 2024-10-17 11:01 UTC (permalink / raw)
  To: Donald Zickus
  Cc: Tim Bird, workflows, automated-testing, linux-kselftest,
	kernelci, Nikolai Kondrashov, Gustavo Padovan, kernelci-members,
	laura.nao

Hi!
> > A few thousand tests to be more precise, and also the content tend to
> > change between releases, be it test additions or removal and I do not
> > think this level of changes is somehing that makes sense to be tracked
> > in such database.
> >
> > It may be better to have more generic description of LTP subsets, there
> > are a few obvious e.g. "SysV IPC" or "Timers", and have the LTP
> > testrunner map that to actual testcases. The hard task here is to figure
> > out which groups would be useful and keep the set reasonably small.
> >
> > I can move this forward in LTP reasonably quickly we get small list of
> > useful groups from kernel develpers.
> 
> Thanks!  The thought was if we wanted to encourage contributors to run
> these tests before submitting, does running the whole LTP testsuite
> make sense or like you said a targeted set would be much better?

The best answer is "it depends". The whole LTP run can take hours on
slower hardware and may not even test the code you wanted to test, e.g.
if you did changes to compat code you have to build LTP with -m32 to
actually excercise the 32bit emulation layer. If you changed kernel core
it may make sense to run whole LTP, on the other hand changes isolated
to a certain subsystems e.g. SysV IPC, Timers, Cgroups, etc. could be
tested fairly quickly with a subset of LTP. So I think that we need some
kind of mapping or heuristics so that we can map certain usecases to a
subsets of tests.

-- 
Cyril Hrubis
chrubis@suse.cz

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Automated-testing] [RFC] Test catalog template
  2024-10-15 16:01 ` [Automated-testing] " Bird, Tim
  2024-10-16 13:10   ` Cyril Hrubis
@ 2024-10-16 18:00   ` Donald Zickus
  1 sibling, 0 replies; 17+ messages in thread
From: Donald Zickus @ 2024-10-16 18:00 UTC (permalink / raw)
  To: Bird, Tim
  Cc: workflows, automated-testing, linux-kselftest, kernelci,
	Nikolai Kondrashov, Gustavo Padovan, kernelci-members, laura.nao

Hi Tim,

On Tue, Oct 15, 2024 at 12:01 PM Bird, Tim <Tim.Bird@sony.com> wrote:
>
> > -----Original Message-----
> > From: automated-testing@lists.yoctoproject.org <automated-testing@lists.yoctoproject.org> On Behalf Of Don Zickus
> > Hi,
> >
> > At Linux Plumbers, a few dozen of us gathered together to discuss how
> > to expose what tests subsystem maintainers would like to run for every
> > patch submitted or when CI runs tests.  We agreed on a mock up of a
> > yaml template to start gathering info.  The yaml file could be
> > temporarily stored on kernelci.org until a more permanent home could
> > be found.  Attached is a template to start the conversation.
> >
> Don,
>
> I'm interested in this initiative.  Is discussion going to be on a kernel mailing
> list, or on this e-mail, or somewhere else?

I was going to keep it on this mailing list.  Open to adding other
lists or moving it.

>
> See a few comments below.
>
> > Longer story.
> >
> > The current problem is CI systems are not unanimous about what tests
> > they run on submitted patches or git branches.  This makes it
> > difficult to figure out why a test failed or how to reproduce.
> > Further, it isn't always clear what tests a normal contributor should
> > run before posting patches.
> >
> > It has been long communicated that the tests LTP, xfstest and/or
> > kselftests should be the tests  to run.
> Just saying "LTP" is not granular enough.  LTP has hundreds of individual
> test programs, and it would be useful to specify the individual tests
> from LTP that should be run per sub-system.

Agreed.  Just reiterating what Greg has told me.

>
> I was particularly intrigued by the presentation at Plumbers about
> test coverage.  It would be nice to have data (or easily replicable
> methods) for determining the code coverage of a test or set of
> tests, to indicate what parts of the kernel are being missed
> and help drive new test development.

It would be nice.  I see that as orthogonal to this effort for now.
But I think this might be a good step towards that idea.

>
> > However, not all maintainers
> > use those tests for their subsystems.  I am hoping to either capture
> > those tests or find ways to convince them to add their tests to the
> > preferred locations.
> >
> > The goal is for a given subsystem (defined in MAINTAINERS), define a
> > set of tests that should be run for any contributions to that
> > subsystem.  The hope is the collective CI results can be triaged
> > collectively (because they are related) and even have the numerous
> > flakes waived collectively  (same reason) improving the ability to
> > find and debug new test failures.  Because the tests and process are
> > known, having a human help debug any failures becomes easier.
> >
> > The plan is to put together a minimal yaml template that gets us going
> > (even if it is not optimized yet) and aim for about a dozen or so
> > subsystems.  At that point we should have enough feedback to promote
> > this more seriously and talk optimizations.
>
> Sounds like a good place to start.  Do we have some candidate sub-systems
> in mind?  Has anyone volunteered to lead the way?

At our meeting, someone suggested Kunit as it was easy to understand
for starters and then add a few other volunteer systems in.  I know we
have a few maintainers who can probably help us get started.  I think
arm and media were ones thrown about at our meeting.

>
> >
> > Feedback encouraged.
> >
> > Cheers,
> > Don
> >
> > ---
> > # List of tests by subsystem
> > #
> > # Tests should adhere to KTAP definitions for results
> > #
> > # Description of section entries
> > #
> > #  maintainer:    test maintainer - name <email>
> > #  list:                mailing list for discussion
> > #  version:         stable version of the test
> > #  dependency: necessary distro package for testing
> > #  test:
> > #    path:            internal git path or url to fetch from
> > #    cmd:            command to run; ability to run locally
> > #    param:         additional param necessary to run test
> > #  hardware:      hardware necessary for validation
>
> Is this something new in MAINTAINERS, or is it a separate file?

For now a separate file.  It isn't clear where this could go long
term.  The thought was to gather data to see what is necessary first.
Long term it will probably stay a separate file. *shrugs*

>
> > #
> > # Subsystems (alphabetical)
> >
> > KUNIT TEST:
> >   maintainer:
> >     - name: name1
> >       email: email1
> >     - name: name2
> >       email: email2
> >   list:
> >   version:
> >   dependency:
> >     - dep1
> >     - dep2
> >   test:
> >     - path: tools/testing/kunit
> >       cmd:
> >       param:
> >     - path:
> >       cmd:
> >       param:
> >   hardware: none
>
> Looks OK so far - it'd be nice to have a few concrete examples.

Fair enough.  Let me try and work on some.

Cheers,
Don

>  -- Tim
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Test catalog template
  2024-10-14 20:32 [RFC] Test catalog template Donald Zickus
  2024-10-15 16:01 ` [Automated-testing] " Bird, Tim
@ 2024-10-17 12:31 ` Minas Hambardzumyan
  2024-10-18 19:44   ` Donald Zickus
  2024-10-18  7:21 ` David Gow
  2 siblings, 1 reply; 17+ messages in thread
From: Minas Hambardzumyan @ 2024-10-17 12:31 UTC (permalink / raw)
  To: Donald Zickus, workflows, automated-testing, linux-kselftest, kernelci
  Cc: Nikolai Kondrashov, Gustavo Padovan, kernelci-members, laura.nao

On 10/14/24 15:32, Donald Zickus wrote:
> Hi,
> 
> At Linux Plumbers, a few dozen of us gathered together to discuss how
> to expose what tests subsystem maintainers would like to run for every
> patch submitted or when CI runs tests.  We agreed on a mock up of a
> yaml template to start gathering info.  The yaml file could be
> temporarily stored on kernelci.org until a more permanent home could
> be found.  Attached is a template to start the conversation.
> 
> Longer story.
> 
> The current problem is CI systems are not unanimous about what tests
> they run on submitted patches or git branches.  This makes it
> difficult to figure out why a test failed or how to reproduce.
> Further, it isn't always clear what tests a normal contributor should
> run before posting patches.
> 
> It has been long communicated that the tests LTP, xfstest and/or
> kselftests should be the tests  to run.  However, not all maintainers
> use those tests for their subsystems.  I am hoping to either capture
> those tests or find ways to convince them to add their tests to the
> preferred locations.
> 
> The goal is for a given subsystem (defined in MAINTAINERS), define a
> set of tests that should be run for any contributions to that
> subsystem.  The hope is the collective CI results can be triaged
> collectively (because they are related) and even have the numerous
> flakes waived collectively  (same reason) improving the ability to
> find and debug new test failures.  Because the tests and process are
> known, having a human help debug any failures becomes easier.
> 
> The plan is to put together a minimal yaml template that gets us going
> (even if it is not optimized yet) and aim for about a dozen or so
> subsystems.  At that point we should have enough feedback to promote
> this more seriously and talk optimizations.
> 
> Feedback encouraged.
> 
> Cheers,
> Don
> 
> ---
> # List of tests by subsystem
> #
> # Tests should adhere to KTAP definitions for results
> #
> # Description of section entries
> #
> #  maintainer:    test maintainer - name <email>
> #  list:                mailing list for discussion
> #  version:         stable version of the test
> #  dependency: necessary distro package for testing
> #  test:
> #    path:            internal git path or url to fetch from
> #    cmd:            command to run; ability to run locally
> #    param:         additional param necessary to run test
> #  hardware:      hardware necessary for validation
> #
> # Subsystems (alphabetical)
> 
> KUNIT TEST:
>    maintainer:
>      - name: name1
>        email: email1
>      - name: name2
>        email: email2
>    list:
>    version:
>    dependency:
>      - dep1
>      - dep2
>    test:
>      - path: tools/testing/kunit
>        cmd:
>        param:
>      - path:
>        cmd:
>        param:
>    hardware: none
> 
> 

Don,

thanks for initiating this! I have a few questions/suggestions:

I think the root element in a section (`KUNIT TEST` in your example) is 
expected to be a container of multiple test definitions ( so there will 
be one for LTP, KSelfTest, etc) -- can you confirm?

Assuming above is correct and `test` is a container of multiple test 
definitions, can we add more properties to each:
   * name -- would be a unique name id for each test
   * description -- short description of the test.
   * arch -- applicable platform architectures
   * runtime -- This is subjective as it can be different for different 
systems. but maybe we can have some generic names, like 'SHORT', 
'MEDIUM', 'LONG', etc and each system may scale the timeout locally?

I see you have a `Subsystems` entry in comments section, but not in the 
example. Do you expect it to be part of this file, or will there be a 
file per each subsystem?

Can we define what we mean by a `test`? For me this is a group of one or 
more individual testcases that can be initiated with a single 
command-line, and is expected to run in a 'reasonable' time. Any other 
thoughts?

Thanks!
Minas



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Test catalog template
  2024-10-17 12:31 ` Minas Hambardzumyan
@ 2024-10-18 19:44   ` Donald Zickus
  0 siblings, 0 replies; 17+ messages in thread
From: Donald Zickus @ 2024-10-18 19:44 UTC (permalink / raw)
  To: Minas Hambardzumyan
  Cc: workflows, automated-testing, linux-kselftest, kernelci,
	Nikolai Kondrashov, Gustavo Padovan, kernelci-members, laura.nao

On Thu, Oct 17, 2024 at 8:32 AM Minas Hambardzumyan <minas@ti.com> wrote:
>
> On 10/14/24 15:32, Donald Zickus wrote:
> > Hi,
> >
> > At Linux Plumbers, a few dozen of us gathered together to discuss how
> > to expose what tests subsystem maintainers would like to run for every
> > patch submitted or when CI runs tests.  We agreed on a mock up of a
> > yaml template to start gathering info.  The yaml file could be
> > temporarily stored on kernelci.org until a more permanent home could
> > be found.  Attached is a template to start the conversation.
> >
> > Longer story.
> >
> > The current problem is CI systems are not unanimous about what tests
> > they run on submitted patches or git branches.  This makes it
> > difficult to figure out why a test failed or how to reproduce.
> > Further, it isn't always clear what tests a normal contributor should
> > run before posting patches.
> >
> > It has been long communicated that the tests LTP, xfstest and/or
> > kselftests should be the tests  to run.  However, not all maintainers
> > use those tests for their subsystems.  I am hoping to either capture
> > those tests or find ways to convince them to add their tests to the
> > preferred locations.
> >
> > The goal is for a given subsystem (defined in MAINTAINERS), define a
> > set of tests that should be run for any contributions to that
> > subsystem.  The hope is the collective CI results can be triaged
> > collectively (because they are related) and even have the numerous
> > flakes waived collectively  (same reason) improving the ability to
> > find and debug new test failures.  Because the tests and process are
> > known, having a human help debug any failures becomes easier.
> >
> > The plan is to put together a minimal yaml template that gets us going
> > (even if it is not optimized yet) and aim for about a dozen or so
> > subsystems.  At that point we should have enough feedback to promote
> > this more seriously and talk optimizations.
> >
> > Feedback encouraged.
> >
> > Cheers,
> > Don
> >
> > ---
> > # List of tests by subsystem
> > #
> > # Tests should adhere to KTAP definitions for results
> > #
> > # Description of section entries
> > #
> > #  maintainer:    test maintainer - name <email>
> > #  list:                mailing list for discussion
> > #  version:         stable version of the test
> > #  dependency: necessary distro package for testing
> > #  test:
> > #    path:            internal git path or url to fetch from
> > #    cmd:            command to run; ability to run locally
> > #    param:         additional param necessary to run test
> > #  hardware:      hardware necessary for validation
> > #
> > # Subsystems (alphabetical)
> >
> > KUNIT TEST:
> >    maintainer:
> >      - name: name1
> >        email: email1
> >      - name: name2
> >        email: email2
> >    list:
> >    version:
> >    dependency:
> >      - dep1
> >      - dep2
> >    test:
> >      - path: tools/testing/kunit
> >        cmd:
> >        param:
> >      - path:
> >        cmd:
> >        param:
> >    hardware: none
> >
> >
>
> Don,
>
> thanks for initiating this! I have a few questions/suggestions:
>
> I think the root element in a section (`KUNIT TEST` in your example) is
> expected to be a container of multiple test definitions ( so there will
> be one for LTP, KSelfTest, etc) -- can you confirm?

Actually I may have misled you.  'KUNIT TEST' was an example I picked
out of the MAINTAINERS file as a maintained subsystem that folks
contribute code too.  Well it was the example folks suggested I use at
plumbers (from what I recalled).
Inside the subsystem container is a 'test' section that is the
container of tests needed for the subsystem.

>
> Assuming above is correct and `test` is a container of multiple test
> definitions, can we add more properties to each:
>    * name -- would be a unique name id for each test
>    * description -- short description of the test.
>    * arch -- applicable platform architectures
>    * runtime -- This is subjective as it can be different for different
> systems. but maybe we can have some generic names, like 'SHORT',
> 'MEDIUM', 'LONG', etc and each system may scale the timeout locally?

Based on what I said above, does that change your thoughts a bit?  In
my head the tests are already out there and defined, I am not sure we
can request them to be unique.  And the description can be found in
the url as I envisioned some tests being run across multiple
subsystems, hence minimizing the duplication may be useful.  Happy to
be swayed in a different direction.

I like the idea of a 'timeout'.  That has been useful for our tests
internally.  I can add that to the fields.

>
> I see you have a `Subsystems` entry in comments section, but not in the
> example. Do you expect it to be part of this file, or will there be a
> file per each subsystem?

Hopefully my above comments clarifies your confusion?  The subsystem
is 'KUNIT TEST' in this example.


>
> Can we define what we mean by a `test`? For me this is a group of one or
> more individual testcases that can be initiated with a single
> command-line, and is expected to run in a 'reasonable' time. Any other
> thoughts?

Yes.  I was thinking a test(s) is something the subsystem maintainer
expects all contributors (humans) or testers (human or CI bots) to
that subsystem to run on posted patches.  The test is expected to be
command line driven (copy-n-paste is probably preferrable) and it can
consist of multiple test command lines or a larger testsuite.  Also
happy to be swayed differently.

Interested in your feedback to my comments.

Cheers,
Don

>
> Thanks!
> Minas
>
>
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Test catalog template
  2024-10-14 20:32 [RFC] Test catalog template Donald Zickus
  2024-10-15 16:01 ` [Automated-testing] " Bird, Tim
  2024-10-17 12:31 ` Minas Hambardzumyan
@ 2024-10-18  7:21 ` David Gow
  2024-10-18 14:23   ` Gustavo Padovan
                     ` (2 more replies)
  2 siblings, 3 replies; 17+ messages in thread
From: David Gow @ 2024-10-18  7:21 UTC (permalink / raw)
  To: Donald Zickus
  Cc: workflows, automated-testing, linux-kselftest, kernelci,
	Nikolai Kondrashov, Gustavo Padovan, kernelci-members, laura.nao

[-- Attachment #1: Type: text/plain, Size: 8175 bytes --]

Hi Don,

Thanks for putting this together: the discussion at Plumbers was very useful.

On Tue, 15 Oct 2024 at 04:33, Donald Zickus <dzickus@redhat.com> wrote:
>
> Hi,
>
> At Linux Plumbers, a few dozen of us gathered together to discuss how
> to expose what tests subsystem maintainers would like to run for every
> patch submitted or when CI runs tests.  We agreed on a mock up of a
> yaml template to start gathering info.  The yaml file could be
> temporarily stored on kernelci.org until a more permanent home could
> be found.  Attached is a template to start the conversation.
>

I think that there are two (maybe three) separate problems here:
1. What tests do we want to run (for a given patch/subsystem/environment/etc)?
2. How do we describe those tests in such a way that running them can
be automated?
3. (Exactly what constitutes a 'test'? A single 'test', a whole suite
of tests, a test framework/tool? What about the environment: is, e.g.,
KUnit on UML different from KUnit on qemu-x86_64 different from KUnit
on qemu-arm64?)

My gut feeling here is that (1) is technically quite easy: worst-case
we just make every MAINTAINERS entry link to a document describing
what tests should be run. Actually getting people to write these
documents and then run the tests, though, is very difficult.

(2) is the area where I think this will be most useful. We have some
arbitrary (probably .yaml) file which describes a series of tests to
run in enough detail that we can automate it. My ideal outcome here
would be to have a 'kunit.yaml' file which I can pass to a tool
(either locally or automatically on some CI system) which will run all
of the checks I'd run on an incoming patch. This would include
everything from checkpatch, to test builds, to running KUnit tests and
other test scripts. Ideally, it'd even run these across a bunch of
different environments (architectures, emulators, hardware, etc) to
catch issues which only show up on big-endian or 32-bit machines.

If this means I can publish that yaml file somewhere, and not only
give contributors a way to check that those tests pass on their own
machine before sending a patch out, but also have CI systems
automatically run them (so the results are ready waiting before I
manually review the patch), that'd be ideal.

> Longer story.
>
> The current problem is CI systems are not unanimous about what tests
> they run on submitted patches or git branches.  This makes it
> difficult to figure out why a test failed or how to reproduce.
> Further, it isn't always clear what tests a normal contributor should
> run before posting patches.
>
> It has been long communicated that the tests LTP, xfstest and/or
> kselftests should be the tests  to run.  However, not all maintainers
> use those tests for their subsystems.  I am hoping to either capture
> those tests or find ways to convince them to add their tests to the
> preferred locations.
>
> The goal is for a given subsystem (defined in MAINTAINERS), define a
> set of tests that should be run for any contributions to that
> subsystem.  The hope is the collective CI results can be triaged
> collectively (because they are related) and even have the numerous
> flakes waived collectively  (same reason) improving the ability to
> find and debug new test failures.  Because the tests and process are
> known, having a human help debug any failures becomes easier.
>
> The plan is to put together a minimal yaml template that gets us going
> (even if it is not optimized yet) and aim for about a dozen or so
> subsystems.  At that point we should have enough feedback to promote
> this more seriously and talk optimizations.
>
> Feedback encouraged.
>
> Cheers,
> Don
>
> ---
> # List of tests by subsystem

I think we should split this up into several files, partly to avoid
merge conflicts, partly to make it easy to maintain custom collections
of tests separately.

For example, fs.yaml could contain entries for both xfstests and fs
KUnit and selftests.

It's also probably going to be necessary to have separate sets of
tests for different use-cases. For example, there might be a smaller,
quicker set of tests to run on every patch, and a much longer, more
expensive set which only runs every other day. So I don't think
there'll even be a 1:1 mapping between 'test collections' (files) and
subsystems. But an automated way of running "this collection of tests"
would be very useful, particularly if it's more user-friendly than
just writing a shell script (e.g., having nicely formatted output,
being able to run things in parallel or remotely, etc).

> #
> # Tests should adhere to KTAP definitions for results
> #
> # Description of section entries
> #
> #  maintainer:    test maintainer - name <email>
> #  list:                mailing list for discussion
> #  version:         stable version of the test
> #  dependency: necessary distro package for testing
> #  test:
> #    path:            internal git path or url to fetch from
> #    cmd:            command to run; ability to run locally
> #    param:         additional param necessary to run test
> #  hardware:      hardware necessary for validation
> #
> # Subsystems (alphabetical)
>
> KUNIT TEST:

For KUnit, it'll be interesting to draw the distinction between KUnit
overall and individual KUnit suites.
I'd lean towards having a separate entry for each subsystem's KUnit
tests (including one for KUnit's own tests)

>   maintainer:
>     - name: name1
>       email: email1
>     - name: name2
>       email: email2
>   list:

How important is it to have these in the case where they're already in
the MAINTAINERS file? I can see it being important for tests which
live elsewhere, though eventually, I'd still prefer the subsystem
maintainer to take some responsibility for the tests run for their
subsystems.

>   version:

This field is probably unnecessary for test frameworks which live in
the kernel tree.

>   dependency:
>     - dep1
>     - dep2

If we want to automate this in any way, we're going to need to work
out a way of specifying these. Either we'd have to pick a distro's
package names, or have our own mapping.

(A part of me really likes the idea of having a small list of "known"
dependencies: python, docker, etc, and trying to limit tests to using
those dependencies. Though there are plenty of useful tests with more
complicated dependencies, so that probably won't fly forever.)

>   test:
>     - path: tools/testing/kunit
>       cmd:
>       param:
>     - path:
>       cmd:
>       param:

Is 'path' here supposed to be the path to the test binary, the working
directory, etc?
Maybe there should be 'working_directory', 'cmd', 'args', and 'env'.

>   hardware: none

For KUnit, I'd imagine having a kunit.yaml, with something like this,
including the KUnit tests in the 'kunit' and 'example' suites, and the
'kunit_tool_test.py' test script:

---
KUnit:
  maintainer:
    - name: David Gow
      email: davidgow@google.com
    - name: Brendan Higgins
      email: brendan.higgins@linux.dev
  list: kunit-dev@googlegroups.com
  dependency:
    - python3
  test:
    - path: .
      cmd: tools/testing/kunit.py
      param: run kunit
    - path: .
      cmd: tools/testing/kunit.py
      param: run example
  hardware: none
KUnit Tool:
  maintainer:
    - name: David Gow
      email: davidgow@google.com
    - name: Brendan Higgins
      email: brendan.higgins@linux.dev
  list: kunit-dev@googlegroups.com
  dependency:
    - python3
  test:
    - path: .
      cmd: tools/testing/kunit_tool_test.py
      param:
  hardware: none
---

Obviously there's still some redundancy there, and I've not actually
tried implementing something that could run it. It also lacks any
information about the environment. In practice, I have about 20
different kunit.py invocations which run the tests with different
configs and on different architectures. Though that might make sense
to keep in a separate file to only run if the simpler tests pass. And
equally, it'd be nice to have a 'common.yaml' file with basic patch
and build tests which apply to almost everything (checkpatch, make
defconfig, maybe even make allmodconfig, etc).

Cheers,
-- David

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5294 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Test catalog template
  2024-10-18  7:21 ` David Gow
@ 2024-10-18 14:23   ` Gustavo Padovan
  2024-10-18 14:35     ` [Automated-testing] " Cyril Hrubis
  2024-10-18 19:17   ` Mark Brown
  2024-10-18 20:17   ` Donald Zickus
  2 siblings, 1 reply; 17+ messages in thread
From: Gustavo Padovan @ 2024-10-18 14:23 UTC (permalink / raw)
  To: David Gow
  Cc: Donald Zickus, workflows, automated-testing, linux-kselftest,
	kernelci, Nikolai Kondrashov, kernelci-members, laura.nao

Hello,

---- On Fri, 18 Oct 2024 04:21:58 -0300 David Gow  wrote ---

 > Hi Don, 
 >  
 > Thanks for putting this together: the discussion at Plumbers was very useful. 
 >  
 > On Tue, 15 Oct 2024 at 04:33, Donald Zickus dzickus@redhat.com> wrote: 
 > > 
 > > Hi, 
 > > 
 > > At Linux Plumbers, a few dozen of us gathered together to discuss how 
 > > to expose what tests subsystem maintainers would like to run for every 
 > > patch submitted or when CI runs tests.  We agreed on a mock up of a 
 > > yaml template to start gathering info.  The yaml file could be 
 > > temporarily stored on kernelci.org until a more permanent home could 
 > > be found.  Attached is a template to start the conversation. 
 > > 
 >  
 > I think that there are two (maybe three) separate problems here: 
 > 1. What tests do we want to run (for a given patch/subsystem/environment/etc)? 
 > 2. How do we describe those tests in such a way that running them can 
 > be automated? 
 > 3. (Exactly what constitutes a 'test'? A single 'test', a whole suite 
 > of tests, a test framework/tool? What about the environment: is, e.g., 
 > KUnit on UML different from KUnit on qemu-x86_64 different from KUnit 
 > on qemu-arm64?) 
 >  
 > My gut feeling here is that (1) is technically quite easy: worst-case 
 > we just make every MAINTAINERS entry link to a document describing 
 > what tests should be run. Actually getting people to write these 
 > documents and then run the tests, though, is very difficult. 
 >  
 > (2) is the area where I think this will be most useful. We have some 
 > arbitrary (probably .yaml) file which describes a series of tests to 
 > run in enough detail that we can automate it. My ideal outcome here 
 > would be to have a 'kunit.yaml' file which I can pass to a tool 
 > (either locally or automatically on some CI system) which will run all 
 > of the checks I'd run on an incoming patch. This would include 
 > everything from checkpatch, to test builds, to running KUnit tests and 
 > other test scripts. Ideally, it'd even run these across a bunch of 
 > different environments (architectures, emulators, hardware, etc) to 
 > catch issues which only show up on big-endian or 32-bit machines. 
 >  
 > If this means I can publish that yaml file somewhere, and not only 
 > give contributors a way to check that those tests pass on their own 
 > machine before sending a patch out, but also have CI systems 
 > automatically run them (so the results are ready waiting before I 
 > manually review the patch), that'd be ideal. 

This though makes sense to me. It will be very interesting for CI systems to be
able to figure out which tests to run for a set of folder/file changes. 

However, I also feel that a key part of the work is actually convincing people
to write (and maintain!) these specs. Only automation through CI we may be able
to show the value of this tasks, prompting maintainers to keep their files
updated, otherwise we are going create a sea of specs that will just be outdated
pretty quickly.

In the new KernelCI maestro, we started with only a handful of tests, so we could
actually look at the results, find regressions and report them. Maybe we could
start in the same way with a few tests. Eg kselftest-dt and kselftests-acpi. It
should be relatively simple to make something that will decide on testing probe
of drivers based on which files are being changed.

There needs to be a sort of cultural shift on how we track tests first. Just documenting
our current tests may not take us far, but starting small with a comprehensive process
from test spec to CI automation to clear ways of deliverying results is the game changer.

Then there are other perspectives that crosses this. For example, many of the LTP and
kselftests will just fail, but there is no accumulated knowledge on what the result of
each test means. So understanding what is expected to pass/fail for each platform is
a sort of dependance in this extensive documentation effort we are set ourselves for.

Best,

- Gus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Automated-testing] [RFC] Test catalog template
  2024-10-18 14:23   ` Gustavo Padovan
@ 2024-10-18 14:35     ` Cyril Hrubis
  0 siblings, 0 replies; 17+ messages in thread
From: Cyril Hrubis @ 2024-10-18 14:35 UTC (permalink / raw)
  To: Gustavo Padovan
  Cc: David Gow, Donald Zickus, workflows, automated-testing,
	linux-kselftest, kernelci, Nikolai Kondrashov, kernelci-members,
	laura.nao

Hi!
> Then there are other perspectives that crosses this. For example, many of the LTP and
> kselftests will just fail, but there is no accumulated knowledge on what the result of
> each test means. So understanding what is expected to pass/fail for each platform is
> a sort of dependance in this extensive documentation effort we are set ourselves for.

We are spending quite a lot of time to make sure LTP tests do not fail
unless there is a reason to. If you see LTP tests failing and you think
that they shouldn't just report it on the LTP mailing list and we will
fix that.

-- 
Cyril Hrubis
chrubis@suse.cz

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Test catalog template
  2024-10-18  7:21 ` David Gow
  2024-10-18 14:23   ` Gustavo Padovan
@ 2024-10-18 19:17   ` Mark Brown
  2024-10-18 20:17   ` Donald Zickus
  2 siblings, 0 replies; 17+ messages in thread
From: Mark Brown @ 2024-10-18 19:17 UTC (permalink / raw)
  To: David Gow
  Cc: Donald Zickus, workflows, automated-testing, linux-kselftest,
	kernelci, Nikolai Kondrashov, Gustavo Padovan, kernelci-members,
	laura.nao

[-- Attachment #1: Type: text/plain, Size: 1394 bytes --]

On Fri, Oct 18, 2024 at 03:21:58PM +0800, David Gow wrote:

> It's also probably going to be necessary to have separate sets of
> tests for different use-cases. For example, there might be a smaller,
> quicker set of tests to run on every patch, and a much longer, more
> expensive set which only runs every other day. So I don't think
> there'll even be a 1:1 mapping between 'test collections' (files) and
> subsystems. But an automated way of running "this collection of tests"
> would be very useful, particularly if it's more user-friendly than
> just writing a shell script (e.g., having nicely formatted output,
> being able to run things in parallel or remotely, etc).

This is definitely the case for me, I have an escallating set of tests
that I run per patch, per branch and for things like sending pull
requests.

> >   maintainer:
> >     - name: name1
> >       email: email1
> >     - name: name2
> >       email: email2
> >   list:

> How important is it to have these in the case where they're already in
> the MAINTAINERS file? I can see it being important for tests which
> live elsewhere, though eventually, I'd still prefer the subsystem
> maintainer to take some responsibility for the tests run for their
> subsystems.

It does seem useful to list the maintainers for tests in addition to the
maintaienrs for the code, and like you say some of the tests are out of
tree.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Test catalog template
  2024-10-18  7:21 ` David Gow
  2024-10-18 14:23   ` Gustavo Padovan
  2024-10-18 19:17   ` Mark Brown
@ 2024-10-18 20:17   ` Donald Zickus
  2024-10-19  6:36     ` David Gow
  2 siblings, 1 reply; 17+ messages in thread
From: Donald Zickus @ 2024-10-18 20:17 UTC (permalink / raw)
  To: David Gow
  Cc: workflows, automated-testing, linux-kselftest, kernelci,
	Nikolai Kondrashov, Gustavo Padovan, kernelci-members, laura.nao

On Fri, Oct 18, 2024 at 3:22 AM David Gow <davidgow@google.com> wrote:
>
> Hi Don,
>
> Thanks for putting this together: the discussion at Plumbers was very useful.
>
> On Tue, 15 Oct 2024 at 04:33, Donald Zickus <dzickus@redhat.com> wrote:
> >
> > Hi,
> >
> > At Linux Plumbers, a few dozen of us gathered together to discuss how
> > to expose what tests subsystem maintainers would like to run for every
> > patch submitted or when CI runs tests.  We agreed on a mock up of a
> > yaml template to start gathering info.  The yaml file could be
> > temporarily stored on kernelci.org until a more permanent home could
> > be found.  Attached is a template to start the conversation.
> >
>
> I think that there are two (maybe three) separate problems here:
> 1. What tests do we want to run (for a given patch/subsystem/environment/etc)?

My thinking is this is maintainer's choice.  What would they like to
see run on a patch to verify its correctness?  I would like to think
most maintainers already have scripts they run before commiting
patches to their -next branch.  All I am trying to do is expose what
is already being done I believe.


> 2. How do we describe those tests in such a way that running them can
> be automated?

This is the tricky part.  But I am going to assume that if most
maintainers run tests before committing patches to their -next branch,
then there is a good chance those tests are scripted and command line
driven (this is the kernel community, right :-) ).  So if we could
expose those scripts and make the copy-and-pastable such that
contributors or testers (including CI bots) can just copy and run
them.  Some maintainers have more complex environments and separating
command line driven tests from the environment scripts might be
tricky.

Does that sound reasonable?

> 3. (Exactly what constitutes a 'test'? A single 'test', a whole suite
> of tests, a test framework/tool? What about the environment: is, e.g.,
> KUnit on UML different from KUnit on qemu-x86_64 different from KUnit
> on qemu-arm64?)
>
> My gut feeling here is that (1) is technically quite easy: worst-case
> we just make every MAINTAINERS entry link to a document describing
> what tests should be run. Actually getting people to write these
> documents and then run the tests, though, is very difficult.

Well if I look at kunit or kselftest, really all you are doing as a
subsystem maintainer is asking contributors or testers to run a 'make'
command right?  Everything else is already documented I think.

>
> (2) is the area where I think this will be most useful. We have some
> arbitrary (probably .yaml) file which describes a series of tests to
> run in enough detail that we can automate it. My ideal outcome here
> would be to have a 'kunit.yaml' file which I can pass to a tool
> (either locally or automatically on some CI system) which will run all
> of the checks I'd run on an incoming patch. This would include
> everything from checkpatch, to test builds, to running KUnit tests and
> other test scripts. Ideally, it'd even run these across a bunch of
> different environments (architectures, emulators, hardware, etc) to
> catch issues which only show up on big-endian or 32-bit machines.
>
> If this means I can publish that yaml file somewhere, and not only
> give contributors a way to check that those tests pass on their own
> machine before sending a patch out, but also have CI systems
> automatically run them (so the results are ready waiting before I
> manually review the patch), that'd be ideal.

Yes, that is exactly the goal of this exercise.  :-) but instead of a
kunit.yaml file, it is more of a test.yaml file with hundreds of
subystems inside it (and probably a corresponding get_tests.pl
script)[think how MAINTAINERS file operates and this is a sister
file].

Inside the 'KUNIT' section would be a container of tests that would be
expected to run (like you listed).  Each test has its own command line
and params.


>
> > Longer story.
> >
> > The current problem is CI systems are not unanimous about what tests
> > they run on submitted patches or git branches.  This makes it
> > difficult to figure out why a test failed or how to reproduce.
> > Further, it isn't always clear what tests a normal contributor should
> > run before posting patches.
> >
> > It has been long communicated that the tests LTP, xfstest and/or
> > kselftests should be the tests  to run.  However, not all maintainers
> > use those tests for their subsystems.  I am hoping to either capture
> > those tests or find ways to convince them to add their tests to the
> > preferred locations.
> >
> > The goal is for a given subsystem (defined in MAINTAINERS), define a
> > set of tests that should be run for any contributions to that
> > subsystem.  The hope is the collective CI results can be triaged
> > collectively (because they are related) and even have the numerous
> > flakes waived collectively  (same reason) improving the ability to
> > find and debug new test failures.  Because the tests and process are
> > known, having a human help debug any failures becomes easier.
> >
> > The plan is to put together a minimal yaml template that gets us going
> > (even if it is not optimized yet) and aim for about a dozen or so
> > subsystems.  At that point we should have enough feedback to promote
> > this more seriously and talk optimizations.
> >
> > Feedback encouraged.
> >
> > Cheers,
> > Don
> >
> > ---
> > # List of tests by subsystem
>
> I think we should split this up into several files, partly to avoid
> merge conflicts, partly to make it easy to maintain custom collections
> of tests separately.
>
> For example, fs.yaml could contain entries for both xfstests and fs
> KUnit and selftests.

I am not opposed to the idea.  But I am a fan of the user experience.
So while an fs.yaml might sound good, is it obvious to a contributor
or tester that given a patch, do they know if fs.yaml is the correct
yaml file to parse when running tests?  How do you map a patch to a
yaml file?  I was trying to use subsystems like MAINTAINERS (and
get_maintainers.pl) as my mapping.  Open to better suggestions.

>
> It's also probably going to be necessary to have separate sets of
> tests for different use-cases. For example, there might be a smaller,
> quicker set of tests to run on every patch, and a much longer, more
> expensive set which only runs every other day. So I don't think
> there'll even be a 1:1 mapping between 'test collections' (files) and
> subsystems. But an automated way of running "this collection of tests"
> would be very useful, particularly if it's more user-friendly than
> just writing a shell script (e.g., having nicely formatted output,
> being able to run things in parallel or remotely, etc).

I don't disagree.  I am trying to start small to get things going and
some momentum.  I proposed a container of tests section.  I would like
to think adding another field in each individual test area like
(short, medium, long OR mandatory, performance, nice-to-have) would be
easy to add to the yaml file overall and attempt to accomplish what
you are suggesting.  Thoughts?


>
> > #
> > # Tests should adhere to KTAP definitions for results
> > #
> > # Description of section entries
> > #
> > #  maintainer:    test maintainer - name <email>
> > #  list:                mailing list for discussion
> > #  version:         stable version of the test
> > #  dependency: necessary distro package for testing
> > #  test:
> > #    path:            internal git path or url to fetch from
> > #    cmd:            command to run; ability to run locally
> > #    param:         additional param necessary to run test
> > #  hardware:      hardware necessary for validation
> > #
> > # Subsystems (alphabetical)
> >
> > KUNIT TEST:
>
> For KUnit, it'll be interesting to draw the distinction between KUnit
> overall and individual KUnit suites.
> I'd lean towards having a separate entry for each subsystem's KUnit
> tests (including one for KUnit's own tests)

KUNIT may not have been the best 'common' test example due to its
complexities across other subsystems. :-/

>
> >   maintainer:
> >     - name: name1
> >       email: email1
> >     - name: name2
> >       email: email2
> >   list:
>
> How important is it to have these in the case where they're already in
> the MAINTAINERS file? I can see it being important for tests which
> live elsewhere, though eventually, I'd still prefer the subsystem
> maintainer to take some responsibility for the tests run for their
> subsystems.

I wasn't sure if all subsystem maintainers actually want to maintain
the tests too or just point someone else at it.  I look at LTP as an
example here.  But I could be wrong.

>
> >   version:
>
> This field is probably unnecessary for test frameworks which live in
> the kernel tree.

Possibly.  It was brought up at Plumbers, so I included it for completeness.

>
> >   dependency:
> >     - dep1
> >     - dep2
>
> If we want to automate this in any way, we're going to need to work
> out a way of specifying these. Either we'd have to pick a distro's
> package names, or have our own mapping.

Agreed.  I might lean on what 'perf' outputs.  They do dependency
detection and output suggested missing packages.  Their auto detection
of already included deps is rather complicated though.

>
> (A part of me really likes the idea of having a small list of "known"
> dependencies: python, docker, etc, and trying to limit tests to using
> those dependencies. Though there are plenty of useful tests with more
> complicated dependencies, so that probably won't fly forever.)

Hehe.  For Fedora/RHEL at least, python has hundreds of smaller
library packages.  That is tricky.  And further some tests like to
compile, which means a bunch of -devel packages.  Each distro has
different names for their -devel packages. :-/
But a side goal of this effort is to define some community standards.
Perhaps we can influence things here to clean up this problem??

>
> >   test:
> >     - path: tools/testing/kunit
> >       cmd:
> >       param:
> >     - path:
> >       cmd:
> >       param:
>
> Is 'path' here supposed to be the path to the test binary, the working
> directory, etc?
> Maybe there should be 'working_directory', 'cmd', 'args', and 'env'.

The thought was the command to copy-n-paste to run the test after
installing it.  I am thinking most tests might be a git-clone or
exploded tarball, leaving the path to be from the install point.  So
maybe working_directory is more descriptive.

>
> >   hardware: none
>
>
>
> For KUnit, I'd imagine having a kunit.yaml, with something like this,
> including the KUnit tests in the 'kunit' and 'example' suites, and the
> 'kunit_tool_test.py' test script:
>
> ---
> KUnit:
>   maintainer:
>     - name: David Gow
>       email: davidgow@google.com
>     - name: Brendan Higgins
>       email: brendan.higgins@linux.dev
>   list: kunit-dev@googlegroups.com
>   dependency:
>     - python3
>   test:
>     - path: .
>       cmd: tools/testing/kunit.py
>       param: run kunit
>     - path: .
>       cmd: tools/testing/kunit.py
>       param: run example
>   hardware: none
> KUnit Tool:
>   maintainer:
>     - name: David Gow
>       email: davidgow@google.com
>     - name: Brendan Higgins
>       email: brendan.higgins@linux.dev
>   list: kunit-dev@googlegroups.com
>   dependency:
>     - python3
>   test:
>     - path: .
>       cmd: tools/testing/kunit_tool_test.py
>       param:
>   hardware: none
> ---
>
> Obviously there's still some redundancy there, and I've not actually
> tried implementing something that could run it. It also lacks any
> information about the environment. In practice, I have about 20
> different kunit.py invocations which run the tests with different
> configs and on different architectures. Though that might make sense
> to keep in a separate file to only run if the simpler tests pass. And
> equally, it'd be nice to have a 'common.yaml' file with basic patch
> and build tests which apply to almost everything (checkpatch, make
> defconfig, maybe even make allmodconfig, etc).

Nice, thanks for the more detailed example.

Cheers,
Don

>
> Cheers,
> -- David


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Test catalog template
  2024-10-18 20:17   ` Donald Zickus
@ 2024-10-19  6:36     ` David Gow
  2024-11-06 17:01       ` Donald Zickus
  0 siblings, 1 reply; 17+ messages in thread
From: David Gow @ 2024-10-19  6:36 UTC (permalink / raw)
  To: Donald Zickus
  Cc: workflows, automated-testing, linux-kselftest, kernelci,
	Nikolai Kondrashov, Gustavo Padovan, kernelci-members, laura.nao

[-- Attachment #1: Type: text/plain, Size: 16040 bytes --]

On Sat, 19 Oct 2024 at 04:17, Donald Zickus <dzickus@redhat.com> wrote:
>
> On Fri, Oct 18, 2024 at 3:22 AM David Gow <davidgow@google.com> wrote:
> >
> > Hi Don,
> >
> > Thanks for putting this together: the discussion at Plumbers was very useful.
> >
> > On Tue, 15 Oct 2024 at 04:33, Donald Zickus <dzickus@redhat.com> wrote:
> > >
> > > Hi,
> > >
> > > At Linux Plumbers, a few dozen of us gathered together to discuss how
> > > to expose what tests subsystem maintainers would like to run for every
> > > patch submitted or when CI runs tests.  We agreed on a mock up of a
> > > yaml template to start gathering info.  The yaml file could be
> > > temporarily stored on kernelci.org until a more permanent home could
> > > be found.  Attached is a template to start the conversation.
> > >
> >
> > I think that there are two (maybe three) separate problems here:
> > 1. What tests do we want to run (for a given patch/subsystem/environment/etc)?
>
> My thinking is this is maintainer's choice.  What would they like to
> see run on a patch to verify its correctness?  I would like to think
> most maintainers already have scripts they run before commiting
> patches to their -next branch.  All I am trying to do is expose what
> is already being done I believe.
>

Agreed.

>
> > 2. How do we describe those tests in such a way that running them can
> > be automated?
>
> This is the tricky part.  But I am going to assume that if most
> maintainers run tests before committing patches to their -next branch,
> then there is a good chance those tests are scripted and command line
> driven (this is the kernel community, right :-) ).  So if we could
> expose those scripts and make the copy-and-pastable such that
> contributors or testers (including CI bots) can just copy and run
> them.  Some maintainers have more complex environments and separating
> command line driven tests from the environment scripts might be
> tricky.
>
> Does that sound reasonable?

Yeah: that's basically what I'd want.

>
> > 3. (Exactly what constitutes a 'test'? A single 'test', a whole suite
> > of tests, a test framework/tool? What about the environment: is, e.g.,
> > KUnit on UML different from KUnit on qemu-x86_64 different from KUnit
> > on qemu-arm64?)
> >
> > My gut feeling here is that (1) is technically quite easy: worst-case
> > we just make every MAINTAINERS entry link to a document describing
> > what tests should be run. Actually getting people to write these
> > documents and then run the tests, though, is very difficult.
>
> Well if I look at kunit or kselftest, really all you are doing as a
> subsystem maintainer is asking contributors or testers to run a 'make'
> command right?  Everything else is already documented I think.
>
> >
> > (2) is the area where I think this will be most useful. We have some
> > arbitrary (probably .yaml) file which describes a series of tests to
> > run in enough detail that we can automate it. My ideal outcome here
> > would be to have a 'kunit.yaml' file which I can pass to a tool
> > (either locally or automatically on some CI system) which will run all
> > of the checks I'd run on an incoming patch. This would include
> > everything from checkpatch, to test builds, to running KUnit tests and
> > other test scripts. Ideally, it'd even run these across a bunch of
> > different environments (architectures, emulators, hardware, etc) to
> > catch issues which only show up on big-endian or 32-bit machines.
> >
> > If this means I can publish that yaml file somewhere, and not only
> > give contributors a way to check that those tests pass on their own
> > machine before sending a patch out, but also have CI systems
> > automatically run them (so the results are ready waiting before I
> > manually review the patch), that'd be ideal.
>
> Yes, that is exactly the goal of this exercise.  :-) but instead of a
> kunit.yaml file, it is more of a test.yaml file with hundreds of
> subystems inside it (and probably a corresponding get_tests.pl
> script)[think how MAINTAINERS file operates and this is a sister
> file].
>
> Inside the 'KUNIT' section would be a container of tests that would be
> expected to run (like you listed).  Each test has its own command line
> and params.
>

Yeah. My hope is we can have a "run_tests" tool which parses that
file/files and runs everything.

So whether that ends up being:
run_tests --subsystem "KUNIT" --subsystem "MM"
or
run_test --file "kunit.yaml" --file "mm.yaml"
or even
run_test --patch "my_mm_and_kunit_change.patch"

A CI system can just run it against the changed files in patches, a
user who wants to double check something specific can override it to
force the tests for a subsystem which may be indirectly affected. And
if you're working on some new tests, or some private internal ones,
you can keep your own yaml file and pass that along too.

> >
> > > Longer story.
> > >
> > > The current problem is CI systems are not unanimous about what tests
> > > they run on submitted patches or git branches.  This makes it
> > > difficult to figure out why a test failed or how to reproduce.
> > > Further, it isn't always clear what tests a normal contributor should
> > > run before posting patches.
> > >
> > > It has been long communicated that the tests LTP, xfstest and/or
> > > kselftests should be the tests  to run.  However, not all maintainers
> > > use those tests for their subsystems.  I am hoping to either capture
> > > those tests or find ways to convince them to add their tests to the
> > > preferred locations.
> > >
> > > The goal is for a given subsystem (defined in MAINTAINERS), define a
> > > set of tests that should be run for any contributions to that
> > > subsystem.  The hope is the collective CI results can be triaged
> > > collectively (because they are related) and even have the numerous
> > > flakes waived collectively  (same reason) improving the ability to
> > > find and debug new test failures.  Because the tests and process are
> > > known, having a human help debug any failures becomes easier.
> > >
> > > The plan is to put together a minimal yaml template that gets us going
> > > (even if it is not optimized yet) and aim for about a dozen or so
> > > subsystems.  At that point we should have enough feedback to promote
> > > this more seriously and talk optimizations.
> > >
> > > Feedback encouraged.
> > >
> > > Cheers,
> > > Don
> > >
> > > ---
> > > # List of tests by subsystem
> >
> > I think we should split this up into several files, partly to avoid
> > merge conflicts, partly to make it easy to maintain custom collections
> > of tests separately.
> >
> > For example, fs.yaml could contain entries for both xfstests and fs
> > KUnit and selftests.
>
> I am not opposed to the idea.  But I am a fan of the user experience.
> So while an fs.yaml might sound good, is it obvious to a contributor
> or tester that given a patch, do they know if fs.yaml is the correct
> yaml file to parse when running tests?  How do you map a patch to a
> yaml file?  I was trying to use subsystems like MAINTAINERS (and
> get_maintainers.pl) as my mapping.  Open to better suggestions.
>

One option would be to have multiple files, which still have the
MAINTAINERS subsystems listed within, and worst-case a tool just
parses all of the files in that directory until it finds a matching
one. Maybe a bit slower than having everything in the one file, but it
sidesteps merge conflicts well.

But ideally, I'd like (as mentioned below) to have a tool which I can
use to run tests locally, and being able to run, e.g.,
./run_tests --all -f fs.yaml
If I want to specify the tests I want to run manually, personally I
think a filename would be a bit nicer than having to pass through,
e.g., subsystem names.


> >
> > It's also probably going to be necessary to have separate sets of
> > tests for different use-cases. For example, there might be a smaller,
> > quicker set of tests to run on every patch, and a much longer, more
> > expensive set which only runs every other day. So I don't think
> > there'll even be a 1:1 mapping between 'test collections' (files) and
> > subsystems. But an automated way of running "this collection of tests"
> > would be very useful, particularly if it's more user-friendly than
> > just writing a shell script (e.g., having nicely formatted output,
> > being able to run things in parallel or remotely, etc).
>
> I don't disagree.  I am trying to start small to get things going and
> some momentum.  I proposed a container of tests section.  I would like
> to think adding another field in each individual test area like
> (short, medium, long OR mandatory, performance, nice-to-have) would be
> easy to add to the yaml file overall and attempt to accomplish what
> you are suggesting.  Thoughts?
>

I think that'd be a great idea. Maybe a "stage" field could work, too,
where later tests only run if the previous ones pass. For example:
Stage 0: checkpatch, does it build
Stage 1: KUnit tests, unit tests, single architecture
Stage 2: Full boot tests, selftests, etc
Stage 3: The above tests on other architectures, allyesconfig, randconfig, etc.

Regardless, it'd be useful to be able to name individual tests and/or
configurations and manually trigger them and/or filter on them.

_Maybe_ it makes sense to split up the "what tests to run" and "how
are they run" bits. The obvious split here would be to have the test
catalogue just handle the former, and the "how they're run" bit
entirely live in shell scripts. But if we're going to support running
tests in parallel and nicely displaying results, maybe there'll be a
need to have something more data driven than a shell script.

> >
> > > #
> > > # Tests should adhere to KTAP definitions for results
> > > #
> > > # Description of section entries
> > > #
> > > #  maintainer:    test maintainer - name <email>
> > > #  list:                mailing list for discussion
> > > #  version:         stable version of the test
> > > #  dependency: necessary distro package for testing
> > > #  test:
> > > #    path:            internal git path or url to fetch from
> > > #    cmd:            command to run; ability to run locally
> > > #    param:         additional param necessary to run test
> > > #  hardware:      hardware necessary for validation
> > > #
> > > # Subsystems (alphabetical)
> > >
> > > KUNIT TEST:
> >
> > For KUnit, it'll be interesting to draw the distinction between KUnit
> > overall and individual KUnit suites.
> > I'd lean towards having a separate entry for each subsystem's KUnit
> > tests (including one for KUnit's own tests)
>
> KUNIT may not have been the best 'common' test example due to its
> complexities across other subsystems. :-/
>

Yeah: I think KUnit tests are a good example of the sorts of tests
which would be relatively easy to integrate, but KUnit as a subsystem
can be a bit confusing as an example because no-one's sure if we're
talking about KUnit-the-subsystem or KUnit-the-tests.

> >
> > >   maintainer:
> > >     - name: name1
> > >       email: email1
> > >     - name: name2
> > >       email: email2
> > >   list:
> >
> > How important is it to have these in the case where they're already in
> > the MAINTAINERS file? I can see it being important for tests which
> > live elsewhere, though eventually, I'd still prefer the subsystem
> > maintainer to take some responsibility for the tests run for their
> > subsystems.
>
> I wasn't sure if all subsystem maintainers actually want to maintain
> the tests too or just point someone else at it.  I look at LTP as an
> example here.  But I could be wrong.
>

Fair enough. Maybe we just make this optional, and if empty we
"default" to the subsystem maintainer.

> >
> > >   version:
> >
> > This field is probably unnecessary for test frameworks which live in
> > the kernel tree.
>
> Possibly.  It was brought up at Plumbers, so I included it for completeness.
>

Yeah. Again, good to have, but make it optional.

> >
> > >   dependency:
> > >     - dep1
> > >     - dep2
> >
> > If we want to automate this in any way, we're going to need to work
> > out a way of specifying these. Either we'd have to pick a distro's
> > package names, or have our own mapping.
>
> Agreed.  I might lean on what 'perf' outputs.  They do dependency
> detection and output suggested missing packages.  Their auto detection
> of already included deps is rather complicated though.
>

Sounds good.

> >
> > (A part of me really likes the idea of having a small list of "known"
> > dependencies: python, docker, etc, and trying to limit tests to using
> > those dependencies. Though there are plenty of useful tests with more
> > complicated dependencies, so that probably won't fly forever.)
>
> Hehe.  For Fedora/RHEL at least, python has hundreds of smaller
> library packages.  That is tricky.  And further some tests like to
> compile, which means a bunch of -devel packages.  Each distro has
> different names for their -devel packages. :-/
> But a side goal of this effort is to define some community standards.
> Perhaps we can influence things here to clean up this problem??
>

That'd be nice. :-)

> >
> > >   test:
> > >     - path: tools/testing/kunit
> > >       cmd:
> > >       param:
> > >     - path:
> > >       cmd:
> > >       param:
> >
> > Is 'path' here supposed to be the path to the test binary, the working
> > directory, etc?
> > Maybe there should be 'working_directory', 'cmd', 'args', and 'env'.
>
> The thought was the command to copy-n-paste to run the test after
> installing it.  I am thinking most tests might be a git-clone or
> exploded tarball, leaving the path to be from the install point.  So
> maybe working_directory is more descriptive.
>

Sounds good. In the KUnit case, the tooling currently expects the
working directory to be the root of the kernel checkout, and the
command to be "./tools/testing/kunit/kunit.py"...

> >
> > >   hardware: none
> >
> >
> >
> > For KUnit, I'd imagine having a kunit.yaml, with something like this,
> > including the KUnit tests in the 'kunit' and 'example' suites, and the
> > 'kunit_tool_test.py' test script:
> >
> > ---
> > KUnit:
> >   maintainer:
> >     - name: David Gow
> >       email: davidgow@google.com
> >     - name: Brendan Higgins
> >       email: brendan.higgins@linux.dev
> >   list: kunit-dev@googlegroups.com
> >   dependency:
> >     - python3
> >   test:
> >     - path: .
> >       cmd: tools/testing/kunit.py
> >       param: run kunit
> >     - path: .
> >       cmd: tools/testing/kunit.py
> >       param: run example
> >   hardware: none
> > KUnit Tool:
> >   maintainer:
> >     - name: David Gow
> >       email: davidgow@google.com
> >     - name: Brendan Higgins
> >       email: brendan.higgins@linux.dev
> >   list: kunit-dev@googlegroups.com
> >   dependency:
> >     - python3
> >   test:
> >     - path: .
> >       cmd: tools/testing/kunit_tool_test.py
> >       param:
> >   hardware: none
> > ---
> >
> > Obviously there's still some redundancy there, and I've not actually
> > tried implementing something that could run it. It also lacks any
> > information about the environment. In practice, I have about 20
> > different kunit.py invocations which run the tests with different
> > configs and on different architectures. Though that might make sense
> > to keep in a separate file to only run if the simpler tests pass. And
> > equally, it'd be nice to have a 'common.yaml' file with basic patch
> > and build tests which apply to almost everything (checkpatch, make
> > defconfig, maybe even make allmodconfig, etc).
>
> Nice, thanks for the more detailed example.
>

Thanks,
-- David

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5294 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Test catalog template
  2024-10-19  6:36     ` David Gow
@ 2024-11-06 17:01       ` Donald Zickus
  2024-11-20  8:16         ` David Gow
  0 siblings, 1 reply; 17+ messages in thread
From: Donald Zickus @ 2024-11-06 17:01 UTC (permalink / raw)
  To: David Gow
  Cc: workflows, automated-testing, linux-kselftest, kernelci,
	Nikolai Kondrashov, Gustavo Padovan, kernelci-members, laura.nao

Hi,

Thanks for the feedback. I created a more realistic test.yaml file to
start (we can split it when more tests are added) and a parser.  I was
going to add patch support as input to mimic get_maintainers.pl
output, but that might take some time.  For now, you have to manually
select a subsystem.  I will try to find space on kernelci.org to grow
this work but you can find a git tree here[0].

From the README.md
"""
An attempt to map kernel subsystems to kernel tests that should be run
on patches or code by humans and CI systems.

Examples:

Find test info for a subsystem

./get_tests.py -s 'KUNIT TEST' --info

Subsystem:    KUNIT TEST
Maintainer:
  David Gow <davidgow@google.com>
Mailing List: None
Version:      None
Dependency:   ['python3-mypy']
Test:
  smoke:
    Url: None
    Working Directory: None
    Cmd: ./tools/testing/kunit/kunit.py
    Env: None
    Param: run --kunitconfig lib/kunit
Hardware:     arm64, x86_64

Find copy-n-pastable tests for a subsystem

./get_tests.py -s 'KUNIT TEST'

./tools/testing/kunit/kunit.pyrun --kunitconfig lib/kunit
"""

Is this aligning with what people were expecting?

Cheers,
Don

[0] - https://github.com/dzickusrh/test-catalog/tree/main

On Sat, Oct 19, 2024 at 2:36 AM David Gow <davidgow@google.com> wrote:
>
> On Sat, 19 Oct 2024 at 04:17, Donald Zickus <dzickus@redhat.com> wrote:
> >
> > On Fri, Oct 18, 2024 at 3:22 AM David Gow <davidgow@google.com> wrote:
> > >
> > > Hi Don,
> > >
> > > Thanks for putting this together: the discussion at Plumbers was very useful.
> > >
> > > On Tue, 15 Oct 2024 at 04:33, Donald Zickus <dzickus@redhat.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > At Linux Plumbers, a few dozen of us gathered together to discuss how
> > > > to expose what tests subsystem maintainers would like to run for every
> > > > patch submitted or when CI runs tests.  We agreed on a mock up of a
> > > > yaml template to start gathering info.  The yaml file could be
> > > > temporarily stored on kernelci.org until a more permanent home could
> > > > be found.  Attached is a template to start the conversation.
> > > >
> > >
> > > I think that there are two (maybe three) separate problems here:
> > > 1. What tests do we want to run (for a given patch/subsystem/environment/etc)?
> >
> > My thinking is this is maintainer's choice.  What would they like to
> > see run on a patch to verify its correctness?  I would like to think
> > most maintainers already have scripts they run before commiting
> > patches to their -next branch.  All I am trying to do is expose what
> > is already being done I believe.
> >
>
> Agreed.
>
> >
> > > 2. How do we describe those tests in such a way that running them can
> > > be automated?
> >
> > This is the tricky part.  But I am going to assume that if most
> > maintainers run tests before committing patches to their -next branch,
> > then there is a good chance those tests are scripted and command line
> > driven (this is the kernel community, right :-) ).  So if we could
> > expose those scripts and make the copy-and-pastable such that
> > contributors or testers (including CI bots) can just copy and run
> > them.  Some maintainers have more complex environments and separating
> > command line driven tests from the environment scripts might be
> > tricky.
> >
> > Does that sound reasonable?
>
> Yeah: that's basically what I'd want.
>
> >
> > > 3. (Exactly what constitutes a 'test'? A single 'test', a whole suite
> > > of tests, a test framework/tool? What about the environment: is, e.g.,
> > > KUnit on UML different from KUnit on qemu-x86_64 different from KUnit
> > > on qemu-arm64?)
> > >
> > > My gut feeling here is that (1) is technically quite easy: worst-case
> > > we just make every MAINTAINERS entry link to a document describing
> > > what tests should be run. Actually getting people to write these
> > > documents and then run the tests, though, is very difficult.
> >
> > Well if I look at kunit or kselftest, really all you are doing as a
> > subsystem maintainer is asking contributors or testers to run a 'make'
> > command right?  Everything else is already documented I think.
> >
> > >
> > > (2) is the area where I think this will be most useful. We have some
> > > arbitrary (probably .yaml) file which describes a series of tests to
> > > run in enough detail that we can automate it. My ideal outcome here
> > > would be to have a 'kunit.yaml' file which I can pass to a tool
> > > (either locally or automatically on some CI system) which will run all
> > > of the checks I'd run on an incoming patch. This would include
> > > everything from checkpatch, to test builds, to running KUnit tests and
> > > other test scripts. Ideally, it'd even run these across a bunch of
> > > different environments (architectures, emulators, hardware, etc) to
> > > catch issues which only show up on big-endian or 32-bit machines.
> > >
> > > If this means I can publish that yaml file somewhere, and not only
> > > give contributors a way to check that those tests pass on their own
> > > machine before sending a patch out, but also have CI systems
> > > automatically run them (so the results are ready waiting before I
> > > manually review the patch), that'd be ideal.
> >
> > Yes, that is exactly the goal of this exercise.  :-) but instead of a
> > kunit.yaml file, it is more of a test.yaml file with hundreds of
> > subystems inside it (and probably a corresponding get_tests.pl
> > script)[think how MAINTAINERS file operates and this is a sister
> > file].
> >
> > Inside the 'KUNIT' section would be a container of tests that would be
> > expected to run (like you listed).  Each test has its own command line
> > and params.
> >
>
> Yeah. My hope is we can have a "run_tests" tool which parses that
> file/files and runs everything.
>
> So whether that ends up being:
> run_tests --subsystem "KUNIT" --subsystem "MM"
> or
> run_test --file "kunit.yaml" --file "mm.yaml"
> or even
> run_test --patch "my_mm_and_kunit_change.patch"
>
> A CI system can just run it against the changed files in patches, a
> user who wants to double check something specific can override it to
> force the tests for a subsystem which may be indirectly affected. And
> if you're working on some new tests, or some private internal ones,
> you can keep your own yaml file and pass that along too.
>
> > >
> > > > Longer story.
> > > >
> > > > The current problem is CI systems are not unanimous about what tests
> > > > they run on submitted patches or git branches.  This makes it
> > > > difficult to figure out why a test failed or how to reproduce.
> > > > Further, it isn't always clear what tests a normal contributor should
> > > > run before posting patches.
> > > >
> > > > It has been long communicated that the tests LTP, xfstest and/or
> > > > kselftests should be the tests  to run.  However, not all maintainers
> > > > use those tests for their subsystems.  I am hoping to either capture
> > > > those tests or find ways to convince them to add their tests to the
> > > > preferred locations.
> > > >
> > > > The goal is for a given subsystem (defined in MAINTAINERS), define a
> > > > set of tests that should be run for any contributions to that
> > > > subsystem.  The hope is the collective CI results can be triaged
> > > > collectively (because they are related) and even have the numerous
> > > > flakes waived collectively  (same reason) improving the ability to
> > > > find and debug new test failures.  Because the tests and process are
> > > > known, having a human help debug any failures becomes easier.
> > > >
> > > > The plan is to put together a minimal yaml template that gets us going
> > > > (even if it is not optimized yet) and aim for about a dozen or so
> > > > subsystems.  At that point we should have enough feedback to promote
> > > > this more seriously and talk optimizations.
> > > >
> > > > Feedback encouraged.
> > > >
> > > > Cheers,
> > > > Don
> > > >
> > > > ---
> > > > # List of tests by subsystem
> > >
> > > I think we should split this up into several files, partly to avoid
> > > merge conflicts, partly to make it easy to maintain custom collections
> > > of tests separately.
> > >
> > > For example, fs.yaml could contain entries for both xfstests and fs
> > > KUnit and selftests.
> >
> > I am not opposed to the idea.  But I am a fan of the user experience.
> > So while an fs.yaml might sound good, is it obvious to a contributor
> > or tester that given a patch, do they know if fs.yaml is the correct
> > yaml file to parse when running tests?  How do you map a patch to a
> > yaml file?  I was trying to use subsystems like MAINTAINERS (and
> > get_maintainers.pl) as my mapping.  Open to better suggestions.
> >
>
> One option would be to have multiple files, which still have the
> MAINTAINERS subsystems listed within, and worst-case a tool just
> parses all of the files in that directory until it finds a matching
> one. Maybe a bit slower than having everything in the one file, but it
> sidesteps merge conflicts well.
>
> But ideally, I'd like (as mentioned below) to have a tool which I can
> use to run tests locally, and being able to run, e.g.,
> ./run_tests --all -f fs.yaml
> If I want to specify the tests I want to run manually, personally I
> think a filename would be a bit nicer than having to pass through,
> e.g., subsystem names.
>
>
> > >
> > > It's also probably going to be necessary to have separate sets of
> > > tests for different use-cases. For example, there might be a smaller,
> > > quicker set of tests to run on every patch, and a much longer, more
> > > expensive set which only runs every other day. So I don't think
> > > there'll even be a 1:1 mapping between 'test collections' (files) and
> > > subsystems. But an automated way of running "this collection of tests"
> > > would be very useful, particularly if it's more user-friendly than
> > > just writing a shell script (e.g., having nicely formatted output,
> > > being able to run things in parallel or remotely, etc).
> >
> > I don't disagree.  I am trying to start small to get things going and
> > some momentum.  I proposed a container of tests section.  I would like
> > to think adding another field in each individual test area like
> > (short, medium, long OR mandatory, performance, nice-to-have) would be
> > easy to add to the yaml file overall and attempt to accomplish what
> > you are suggesting.  Thoughts?
> >
>
> I think that'd be a great idea. Maybe a "stage" field could work, too,
> where later tests only run if the previous ones pass. For example:
> Stage 0: checkpatch, does it build
> Stage 1: KUnit tests, unit tests, single architecture
> Stage 2: Full boot tests, selftests, etc
> Stage 3: The above tests on other architectures, allyesconfig, randconfig, etc.
>
> Regardless, it'd be useful to be able to name individual tests and/or
> configurations and manually trigger them and/or filter on them.
>
> _Maybe_ it makes sense to split up the "what tests to run" and "how
> are they run" bits. The obvious split here would be to have the test
> catalogue just handle the former, and the "how they're run" bit
> entirely live in shell scripts. But if we're going to support running
> tests in parallel and nicely displaying results, maybe there'll be a
> need to have something more data driven than a shell script.
>
> > >
> > > > #
> > > > # Tests should adhere to KTAP definitions for results
> > > > #
> > > > # Description of section entries
> > > > #
> > > > #  maintainer:    test maintainer - name <email>
> > > > #  list:                mailing list for discussion
> > > > #  version:         stable version of the test
> > > > #  dependency: necessary distro package for testing
> > > > #  test:
> > > > #    path:            internal git path or url to fetch from
> > > > #    cmd:            command to run; ability to run locally
> > > > #    param:         additional param necessary to run test
> > > > #  hardware:      hardware necessary for validation
> > > > #
> > > > # Subsystems (alphabetical)
> > > >
> > > > KUNIT TEST:
> > >
> > > For KUnit, it'll be interesting to draw the distinction between KUnit
> > > overall and individual KUnit suites.
> > > I'd lean towards having a separate entry for each subsystem's KUnit
> > > tests (including one for KUnit's own tests)
> >
> > KUNIT may not have been the best 'common' test example due to its
> > complexities across other subsystems. :-/
> >
>
> Yeah: I think KUnit tests are a good example of the sorts of tests
> which would be relatively easy to integrate, but KUnit as a subsystem
> can be a bit confusing as an example because no-one's sure if we're
> talking about KUnit-the-subsystem or KUnit-the-tests.
>
> > >
> > > >   maintainer:
> > > >     - name: name1
> > > >       email: email1
> > > >     - name: name2
> > > >       email: email2
> > > >   list:
> > >
> > > How important is it to have these in the case where they're already in
> > > the MAINTAINERS file? I can see it being important for tests which
> > > live elsewhere, though eventually, I'd still prefer the subsystem
> > > maintainer to take some responsibility for the tests run for their
> > > subsystems.
> >
> > I wasn't sure if all subsystem maintainers actually want to maintain
> > the tests too or just point someone else at it.  I look at LTP as an
> > example here.  But I could be wrong.
> >
>
> Fair enough. Maybe we just make this optional, and if empty we
> "default" to the subsystem maintainer.
>
> > >
> > > >   version:
> > >
> > > This field is probably unnecessary for test frameworks which live in
> > > the kernel tree.
> >
> > Possibly.  It was brought up at Plumbers, so I included it for completeness.
> >
>
> Yeah. Again, good to have, but make it optional.
>
> > >
> > > >   dependency:
> > > >     - dep1
> > > >     - dep2
> > >
> > > If we want to automate this in any way, we're going to need to work
> > > out a way of specifying these. Either we'd have to pick a distro's
> > > package names, or have our own mapping.
> >
> > Agreed.  I might lean on what 'perf' outputs.  They do dependency
> > detection and output suggested missing packages.  Their auto detection
> > of already included deps is rather complicated though.
> >
>
> Sounds good.
>
> > >
> > > (A part of me really likes the idea of having a small list of "known"
> > > dependencies: python, docker, etc, and trying to limit tests to using
> > > those dependencies. Though there are plenty of useful tests with more
> > > complicated dependencies, so that probably won't fly forever.)
> >
> > Hehe.  For Fedora/RHEL at least, python has hundreds of smaller
> > library packages.  That is tricky.  And further some tests like to
> > compile, which means a bunch of -devel packages.  Each distro has
> > different names for their -devel packages. :-/
> > But a side goal of this effort is to define some community standards.
> > Perhaps we can influence things here to clean up this problem??
> >
>
> That'd be nice. :-)
>
> > >
> > > >   test:
> > > >     - path: tools/testing/kunit
> > > >       cmd:
> > > >       param:
> > > >     - path:
> > > >       cmd:
> > > >       param:
> > >
> > > Is 'path' here supposed to be the path to the test binary, the working
> > > directory, etc?
> > > Maybe there should be 'working_directory', 'cmd', 'args', and 'env'.
> >
> > The thought was the command to copy-n-paste to run the test after
> > installing it.  I am thinking most tests might be a git-clone or
> > exploded tarball, leaving the path to be from the install point.  So
> > maybe working_directory is more descriptive.
> >
>
> Sounds good. In the KUnit case, the tooling currently expects the
> working directory to be the root of the kernel checkout, and the
> command to be "./tools/testing/kunit/kunit.py"...
>
> > >
> > > >   hardware: none
> > >
> > >
> > >
> > > For KUnit, I'd imagine having a kunit.yaml, with something like this,
> > > including the KUnit tests in the 'kunit' and 'example' suites, and the
> > > 'kunit_tool_test.py' test script:
> > >
> > > ---
> > > KUnit:
> > >   maintainer:
> > >     - name: David Gow
> > >       email: davidgow@google.com
> > >     - name: Brendan Higgins
> > >       email: brendan.higgins@linux.dev
> > >   list: kunit-dev@googlegroups.com
> > >   dependency:
> > >     - python3
> > >   test:
> > >     - path: .
> > >       cmd: tools/testing/kunit.py
> > >       param: run kunit
> > >     - path: .
> > >       cmd: tools/testing/kunit.py
> > >       param: run example
> > >   hardware: none
> > > KUnit Tool:
> > >   maintainer:
> > >     - name: David Gow
> > >       email: davidgow@google.com
> > >     - name: Brendan Higgins
> > >       email: brendan.higgins@linux.dev
> > >   list: kunit-dev@googlegroups.com
> > >   dependency:
> > >     - python3
> > >   test:
> > >     - path: .
> > >       cmd: tools/testing/kunit_tool_test.py
> > >       param:
> > >   hardware: none
> > > ---
> > >
> > > Obviously there's still some redundancy there, and I've not actually
> > > tried implementing something that could run it. It also lacks any
> > > information about the environment. In practice, I have about 20
> > > different kunit.py invocations which run the tests with different
> > > configs and on different architectures. Though that might make sense
> > > to keep in a separate file to only run if the simpler tests pass. And
> > > equally, it'd be nice to have a 'common.yaml' file with basic patch
> > > and build tests which apply to almost everything (checkpatch, make
> > > defconfig, maybe even make allmodconfig, etc).
> >
> > Nice, thanks for the more detailed example.
> >
>
> Thanks,
> -- David


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Test catalog template
  2024-11-06 17:01       ` Donald Zickus
@ 2024-11-20  8:16         ` David Gow
  2024-11-21 15:28           ` Donald Zickus
  0 siblings, 1 reply; 17+ messages in thread
From: David Gow @ 2024-11-20  8:16 UTC (permalink / raw)
  To: Donald Zickus
  Cc: workflows, automated-testing, linux-kselftest, kernelci,
	Nikolai Kondrashov, Gustavo Padovan, kernelci-members, laura.nao

[-- Attachment #1: Type: text/plain, Size: 2967 bytes --]

On Thu, 7 Nov 2024 at 01:01, Donald Zickus <dzickus@redhat.com> wrote:
>
> Hi,
>
> Thanks for the feedback. I created a more realistic test.yaml file to
> start (we can split it when more tests are added) and a parser.  I was
> going to add patch support as input to mimic get_maintainers.pl
> output, but that might take some time.  For now, you have to manually
> select a subsystem.  I will try to find space on kernelci.org to grow
> this work but you can find a git tree here[0].
>
> From the README.md
> """
> An attempt to map kernel subsystems to kernel tests that should be run
> on patches or code by humans and CI systems.
>
> Examples:
>
> Find test info for a subsystem
>
> ./get_tests.py -s 'KUNIT TEST' --info
>
> Subsystem:    KUNIT TEST
> Maintainer:
>   David Gow <davidgow@google.com>
> Mailing List: None
> Version:      None
> Dependency:   ['python3-mypy']
> Test:
>   smoke:
>     Url: None
>     Working Directory: None
>     Cmd: ./tools/testing/kunit/kunit.py
>     Env: None
>     Param: run --kunitconfig lib/kunit
> Hardware:     arm64, x86_64
>
> Find copy-n-pastable tests for a subsystem
>
> ./get_tests.py -s 'KUNIT TEST'
>
> ./tools/testing/kunit/kunit.pyrun --kunitconfig lib/kunit
> """
>
> Is this aligning with what people were expecting?
>


Awesome! I've been playing around a bit with this, and I think it's an
excellent start.

There are definitely some more features I'd want in an ideal world
(e.g., configuration matrices, etc), but this works well enough.

I've been playing around with a branch which adds the ability to
actually run these tests, based on the 'run_checks.py' script we use
for KUnit:
https://github.com/sulix/test-catalog/tree/runtest-wip

In particular, this adds a '-r' option which runs the tests for the
subsystem in parallel. This largely matches what I was doing manually
— for instance, the KUnit section in test.yaml now has three different
tests, and running it gives me this result:
../test-catalog/get_tests.py -r -s 'KUNIT TEST'
Waiting on 3 checks (kunit-tool-test, uml, x86_64)...
kunit-tool-test: PASSED
x86_64: PASSED
uml: PASSED

(Obviously, in the real world, I'd have more checks, including other
architectures, checkpatch, etc, but this works as a proof-of-concept
for me.)

I think the most interesting questions will be:
- How do we make this work with more complicated dependencies
(containers, special hardware, etc)?
- How do we integrate it with CI systems — can we pull the subsystem
name for a patch from MAINTAINERS and look it up here?
- What about things like checkpatch, or general defconfig build tests
which aren't subsystem-specific?
- How can we support more complicated configurations or groups of
configurations?
- Do we add support for specific tools and/or parsing/combining output?

But I'm content to keep playing around with this a bit more for now.

Thanks,
-- David

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5294 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] Test catalog template
  2024-11-20  8:16         ` David Gow
@ 2024-11-21 15:28           ` Donald Zickus
  0 siblings, 0 replies; 17+ messages in thread
From: Donald Zickus @ 2024-11-21 15:28 UTC (permalink / raw)
  To: David Gow
  Cc: workflows, automated-testing, linux-kselftest, kernelci,
	Nikolai Kondrashov, Gustavo Padovan, kernelci-members, laura.nao

Hi David,

On Wed, Nov 20, 2024 at 3:16 AM David Gow <davidgow@google.com> wrote:
>
> On Thu, 7 Nov 2024 at 01:01, Donald Zickus <dzickus@redhat.com> wrote:
> >
> > Hi,
> >
> > Thanks for the feedback. I created a more realistic test.yaml file to
> > start (we can split it when more tests are added) and a parser.  I was
> > going to add patch support as input to mimic get_maintainers.pl
> > output, but that might take some time.  For now, you have to manually
> > select a subsystem.  I will try to find space on kernelci.org to grow
> > this work but you can find a git tree here[0].
> >
> > From the README.md
> > """
> > An attempt to map kernel subsystems to kernel tests that should be run
> > on patches or code by humans and CI systems.
> >
> > Examples:
> >
> > Find test info for a subsystem
> >
> > ./get_tests.py -s 'KUNIT TEST' --info
> >
> > Subsystem:    KUNIT TEST
> > Maintainer:
> >   David Gow <davidgow@google.com>
> > Mailing List: None
> > Version:      None
> > Dependency:   ['python3-mypy']
> > Test:
> >   smoke:
> >     Url: None
> >     Working Directory: None
> >     Cmd: ./tools/testing/kunit/kunit.py
> >     Env: None
> >     Param: run --kunitconfig lib/kunit
> > Hardware:     arm64, x86_64
> >
> > Find copy-n-pastable tests for a subsystem
> >
> > ./get_tests.py -s 'KUNIT TEST'
> >
> > ./tools/testing/kunit/kunit.pyrun --kunitconfig lib/kunit
> > """
> >
> > Is this aligning with what people were expecting?
> >
>
>
> Awesome! I've been playing around a bit with this, and I think it's an
> excellent start.
>
> There are definitely some more features I'd want in an ideal world
> (e.g., configuration matrices, etc), but this works well enough.

Yeah, I was trying to nail down the usability angle first before
expanding with bells and whistles.  I would like to think the yaml
file is flexible enough to handle those features though??

>
> I've been playing around with a branch which adds the ability to
> actually run these tests, based on the 'run_checks.py' script we use
> for KUnit:
> https://github.com/sulix/test-catalog/tree/runtest-wip

Thanks!

>
> In particular, this adds a '-r' option which runs the tests for the
> subsystem in parallel. This largely matches what I was doing manually
> — for instance, the KUnit section in test.yaml now has three different
> tests, and running it gives me this result:
> ../test-catalog/get_tests.py -r -s 'KUNIT TEST'
> Waiting on 3 checks (kunit-tool-test, uml, x86_64)...
> kunit-tool-test: PASSED
> x86_64: PASSED
> uml: PASSED

Interesting.  Originally I was thinking this would be done serially.
I didn't think tests were safe enough to run in parallel.  I am
definitely open to this.  My python isn't the best, but I think your
PR looks reasonable.

>
> (Obviously, in the real world, I'd have more checks, including other
> architectures, checkpatch, etc, but this works as a proof-of-concept
> for me.)
>
> I think the most interesting questions will be:
> - How do we make this work with more complicated dependencies
> (containers, special hardware, etc)?

I was imagining a 'hw-requires' type line to handle the hardware
requests as that seemed natural for a lot of the driver work. Run a
quick check before running the test to see if the required hw is
present or not and bail if it isn't.  The containers piece is a little
trickier and ties into the test environment I think.  The script would
have to create an environment and inject the tests into the
environment and run them.  I would imagine some of this would have to
be static as the setup is complicated.  For example, a 'container'
label would execute custom code to setup a test environment inside a
container.  Open to ideas here.

> - How do we integrate it with CI systems — can we pull the subsystem
> name for a patch from MAINTAINERS and look it up here?

There are two thoughts.  First is yes.  As a developer you probably
want to run something like 'get_maintainers.sh <patch> | get_tests.py
-s -' or something to figure out what variety of tests you should run
before posting.  And a CI system could probably do something similar.

There is also another thought, you already know the subsystem you want
to test.  For example, a patch is usually written for a particular
subsystem that happens to touch code from other subsystems.  You
primarily want to run it against a specified subsystem.  I know Red
Hat's CKI will run against a known subsystem git-tree and would fall
into this category.  While it does leave a gap in other subsystem
testing, sometimes as a human you already know running those extra
tests is mostly a no-op because it doesn't really change anything.

> - What about things like checkpatch, or general defconfig build tests
> which aren't subsystem-specific?

My initial thought is that this is another category of testing.  A lot
of CI tests are workload testing and have predefined configs.  Whereas
a generic testing CI system (think 0-day) would focus on those types
of testing.  So I would lean away from those checks in this approach
or we could add a category 'general' too.  I do know checkpatch rules
vary from maintainer to maintainer.

> - How can we support more complicated configurations or groups of
> configurations?

Examples?

> - Do we add support for specific tools and/or parsing/combining output?

Examples?  I wasn't thinking of parsing test output, just providing
what to run as a good first step.  My initial thought was to help
nudge tests towards the KTAP output??

>
> But I'm content to keep playing around with this a bit more for now.

Thank you! Please do!

Cheers,
Don

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-11-21 15:28 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-14 20:32 [RFC] Test catalog template Donald Zickus
2024-10-15 16:01 ` [Automated-testing] " Bird, Tim
2024-10-16 13:10   ` Cyril Hrubis
2024-10-16 18:02     ` Donald Zickus
2024-10-17 11:01       ` Cyril Hrubis
2024-10-16 18:00   ` Donald Zickus
2024-10-17 12:31 ` Minas Hambardzumyan
2024-10-18 19:44   ` Donald Zickus
2024-10-18  7:21 ` David Gow
2024-10-18 14:23   ` Gustavo Padovan
2024-10-18 14:35     ` [Automated-testing] " Cyril Hrubis
2024-10-18 19:17   ` Mark Brown
2024-10-18 20:17   ` Donald Zickus
2024-10-19  6:36     ` David Gow
2024-11-06 17:01       ` Donald Zickus
2024-11-20  8:16         ` David Gow
2024-11-21 15:28           ` Donald Zickus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox