Project

General

Profile

Wiki » History » Revision 20

Revision 19 (okurz, 2016-10-21 07:07) → Revision 20/62 (okurz, 2016-11-21 14:47)

# Introduction 

 {{toc}} 

 Also see https://progress.opensuse.org/projects/openqav3/wiki 


 # Organisational 

 ## ticket workflow 

 The following ticket statuses are used together and their meaning is explained: 

 * *New*: No one has worked on the ticket or no one is feeling responsible for the work on this ticket 
 * *In Progress*: Any state between *New* and *Resolved* 
 * *Resolved*: The complete work on this issue is done and the according issue is supposed to be fixed as observed (Should be updated together with a link to a verification job run on a production openQA instance or at least a merged pull request) 
 * *Feedback*: Further work on the ticket is blocked by external dependency or open points. Sometimes also used to ask Assignee about progress on inactivity 
 * *Rejected*: The issue is considered invalid, should not be done, is considered out of scope. 
 * *Closed*: As this can be set only by administrators it is suggested to not use this status. 

 It is good practice to update the status together with a comment about it, e.g. a link to a pull request or a reason for reject. 
 Also see the [[Wiki#Definition-of-DONE|Definition-of-DONE]] on the use of ticket status, especially when to set *Resolved*. 


 # test organization on https://openqa.suse.de/ 

 ## job group names 

 ### Job group names should be consistent and structured for easy (daily) review of the current status 

 template: 
 ``` 
 <product_group_short_name> <order_nr>.<product_variant> 
 ``` 
 e.g. "SLE 12 SP1 1.Server". Keep the whitespace for separation consistent, also see https://progress.opensuse.org/issues/9916 

 ### Released products should be named with a prefix 'x' to show up late in the overview page 

 This way we can keep track if tests fail even though the product does not produce new builds. This could help us crosscheck tests. E.g. "x-released SLE 12 SP1 1.Server". 

 lowercase "x" as all our product names start with capital letters. Sorting works regardless (or uppercase first?). 

 For now we do not retrigger tests on old builds automatically but any test developer may retrigger it manually, e.g. if he suspects the tests broke and he wants to confirm that local changes are not at fault. 

 

 # needling best practices 
 There are also other locations where "needling best practices" can be found but we should also have the possibility to keep something on the wiki. Feel free to contact me (okurz) and tell me where it should be instead if there is a better place. Also look into [openQA Pitfalls](https://github.com/os-autoinst/openQA/blob/master/docs/Pitfalls.asciidoc) 

 ## applying "workaround" needles 
 If a test reveals a product issue of minor importance it can make sense create a needle with the property "workaround" set. This way, if the needle is matched, the test records this as a "soft-fail". To backtrack the product issue and follow on this and eventually delete the workaround needle if the product issue is fixed, the product issue should be recorded in the needle name itself and at best also in the git commit message adding the needle. If test changes are necessary the source code should have a corresponding comment referencing the issue as well as marking start and stop of the test procedure that is necessary for applying the workaround. Example for a needle name: "gdm-workaround-bsc962806-20160125" referencing bsc#962806 

 *keep in mind:* 
 Since [gh-os-autoinst#532](https://github.com/os-autoinst/os-autoinst/pull/532) workaround needles are always preferred, otherwise if two needles match, the first in alphabetical list wins. 

 

 ## do not overwrite old needles because old date confuses people 
 With the needle editor a timestamp of the current day is automatically added to new needles. When updating a needle, don't overwrite a needle with the old date tag not to confuse people as it will look really weird in the needle editor. 

 

 ## needle indidvidual column entries in tables 
 **Problem**: Tables might auto-adjust column size based on content. Therefore it is unsafe to create needles covering multiple columns in a row. Failing example: https://openqa.suse.de/tests/441169#step/yast2_snapper/23 
 **Solution**: Needles support multiple areas. Use them to needle individual cells in this example. 


 ## don't include version specific content in needles 

 **Problem**: Creating a needle that covers version number of application or product version fails often for every update, e.g. see [opensuse-42.2-DVD-x86_64-Build0112-xfce@64bit](https://openqa.opensuse.org/tests/228793#step/firefox/10). Obviously the needle does not match because no one so far created a needle for firefox 47 on Leap42.2 on xfce. 
 **Solution**: openQA in general supports exclusion areas and even OCR but they have its [flaws](https://progress.opensuse.org/issues/12858). For now better carefully select matching areas so that versions are not included like in the following example 
 ![needling example](openQA_needle_firefox_wo_version_cropped.png). 

 

 # Definition of DONE/READY 

 Each of the following points has to be fulfilled to regard individual contributions as *DONE*. Not every step has to be done by the same step. The overall completion is in responsibility of the complete team. 

 

 ## Definition of DONE 

 Also see http://www.allaboutagile.com/definition-of-done-10-point-checklist/ and https://www.scrumalliance.org/community/articles/2008/september/what-is-definition-of-done-%28dod%29 

 The following definitions are used to ensure development on individual tests has been completed covering all existing different workflows, e.g. covering "hot-fixes" on the productive instance as well as contributions by new contributors with no previous experience and no control over needle generation on productive instances. 

 * Code changes are made available via a pull request on the github repository 
 * New tests as individual test modules (i.e. files under `tests/`): They are loaded in main.pm of sle and/or opensuse  
 * "make test" works (e.g. automatic travis CI check triggered on each github PR) 
 * [Guidelines for git commits](http://chris.beams.io/posts/git-commit/) have been followed 
 * Code has been reviewed (e.g. in the github PR) 
 * Favored, but depending on criticality/complexity/size: A local verification test has been run, e.g. post link to a local openQA machine or screenshot or logfile 
 * Test modules that have been touched have updated metadata, e.g. "Maintainer" and "Summary" (#13034) 
 * Code has been merged (either by reviewer or reviewee after 'LGTM' from others) 
 * Code has been deployed to osd and o3 (automatic git sync every few minutes) 
 * If new variables are necessary (feature toggles): A test_suite is executing the test, e.g. test_suite is created or variable is added to existing test_suite over web interface configuration on osd and/or o3 
 * If a new test_suite has been created: The test_suite is added to at least one job_group 
 * Necessary needles are made available as PR for sle and/or opensuse (depending if executed, see above for 'main.pm') or are created on the productive instance 
 * At least one successful test run has been observed on osd or o3 and referenced in the corresponding progress item or bugzilla bug report if one exists 

 

 ## Definition of READY for new tests 

 The following points should be considered before a new test is READY to be implemented: 

 * Either a product bug has been discovered for which there is no automated test in openQA or a FATE request for new features exists 
 * A test case description exists depicting the prerequisites of the test, the steps to conduct and the expected result 
 * The impact and applicability for both SLE and openSUSE products has been considered 

 # Test development instances (staging openQA instances) 

 Contributors cannot afford to verify a newly developed test in all scenarios run by o3 or osd, so tests will break sometime. It would be useful to use a machine to run a subset of the scenarios run in the official instance(s) to make sure the new tests can be deployed with some degree of confidence. But: Any "staging openQA instance" would not be able to run everything which is run in production. It just does not scale. So anyway only a subset can be run and there can be always something missing. Also, we don't have the hardware capacity to cover everything twice and also consider SLE plus openSUSE. Our [DOD](https://progress.opensuse.org/projects/openqatests/wiki/Wiki#Definition-of-DONEREADY) should cover some important steps so that external contributors are motivated to test something locally first. We have a good test review process and it has to be decided by the reviewer if he accepts the risk of a new test with or without a local verification and covering which scenarios. Depending on the contributors it might make sense to setup a staging server with a subset of tests which is used by multiple test developers to share the burden of openQA setup and administration. For example the YaST team has one available: https://wiki.microfocus.net/index.php/YAST/openQA 
 If you want to follow this model you can watch [this talk by Christopher Hofmann from the OSC16](https://events.opensuse.org/conference/oSC16/program/proposal/986) or ask the YaST team for their experiences. 


 # Tips for test development and issue investigation 

 Examples mentioned here write `clone_job` and `client`. Replace this by a call to the scripts within openQA installation with the corresponding name and proper arguments to provide your API key as well as the host selection, e.g. `/usr/share/openqa/client --host https://openqa.opensuse.org` with your API key configured in `~/.config/openqa/client.conf` 

 ## Uploading image files to openqa server and run test on it 

 You can manually trigger a test job with explicit name as one-shot overriding the variables as necessary, for example: 

 as geekotest@openqa: 

 ``` 
 cd /var/lib/openqa/factory/hdd 
 wget http://<my_host>/<path>.qcow2 -O <new_image_name>.qcow2 
 cd /var/lib/openqa/factory/iso 
 /usr/share/openqa/script/client isos post --params SLE-12-SP2-Server-DVD-ppc64le-Build1651-Media1.iso.6.json HDD_1=SLE-12-Server-ppc64le-GM-gnome_with_snapper.qcow2 TEST=migration_offline_sle12_ppc BUILD=1651_<your_short_name> 
 ``` 

 why `SLE-12-SP2-Server-DVD-ppc64le-Build1651-Media1.iso.6.json`? I checked `SLE-12-SP2-Server-DVD-ppc64le-Build1651-Media1.iso.?.json`: There are `…5…` and `…6…`. `…5…` is for *HA* so I chose 6. 

 The job can be cleaned afterwards to tidy up the build history with: 

 ``` 
 client jobs/463859 delete 
 ``` 

 ## Create new HDD image with openQA 
 ``` 
 client jobs post DISTRI=sle VERSION=12 FLAVOR=Server-DVD ARCH=ppc64le BACKEND=qemu \ 
 NOVIDEO=1 OFW=1 QEMUCPU=host SERIALDEV=hvc0 BUILD=okurz_poo9714 \ 
 ISO=SLE-12-Server-DVD-ppc64le-GM-DVD1.iso INSTALLONLY=1 QEMU_COMPRESS_QCOW2=1 \ 
 PUBLISH_HDD_1=SLES-12-GM-gnome-ppc64le_snapper_20g.qcow2 TEST=create_gm_ppc_image \ 
 MACHINE=ppc64le WORKER_CLASS=qemu_ppc64le HDDSIZEGB=20 MAX_JOB_TIME=86400 TIMEOUT_SCALE=10 
 ``` 

 The `MAX_JOB_TIME=86400 TIMEOUT_SCALE=10` allows for interactive login during the process in case you want to manually adjust or debug. Beware though that `TIMEOUT_SCALE=10` also scales the waiting time on `check_screen` so that the whole job might take longer to execute. 

 To run a test but based on the new HDD image search for a good example and clone it with adjusted parameter: 

 ``` 
 clone_job 462022 HDD_1=SLES-12-GM-gnome-ppc64le_snapper_20g.qcow2 
 ``` 


 ## Interactive investigation 

 While a job is running one can connect to the worker (if network access is possible) using VNC. One challenge is that the test is still running and manual interaction with the system interferes with the test and vice versa. 


 ### Making the test stop for long enough to be able to connect 

 If you can change the test code, i.e. if running on a development machine, you can for example add a `sleep 3600;` or `wait_serial 'CONTINUE';` at the point in test when you want to connect to the system and interact with it, e.g. to gather additional logs. In case of `wait_serial 'CONTINUE';` you can echo 'CONTINUE' to the serial point to let the test continue, e.g. call `echo 'CONTINUE' > /dev/ttyS0;`. 

 In case you can not or do not want to change the test code or your test run is stopping anyway at a certain point with long enough timeout you can also increase timeout with `TIMEOUT_SCALE`, e.g. trigger it with the job variable `TIMEOUT_SCALE=10`. For example a `script_run` with default timeout of 90 seconds will wait for 900 seconds (=15 minutes) which should give enough time in most cases already. 


 ### Connecting over VNC 

 Then connected to the instance when it stalled over VNC. The VNC port is shown on the job live view as a hover text on the instance name. Make sure to use a "shared" connection in your vncviewer. `krdc`, the default KDE VNC viewer, as well as `vinagre`, default GNOME VNC viewer, do this already. For TigerVNC use for example: 

 ``` 
 vncviewer -Shared malbec.arch:91 
 ``` 


 ### Forwarding of special shortcuts 

 The default `vncviewer` in openSUSE/SUSE systems is recommended as it can also be used to forward special keyboard shortcuts. E.g. to change to text console: 
 Press *F8* in vncviewer, select *ctrl* and *alt* in menue, exit menue, press *F2*. 


 ### Requesting video when by default you do not have video in your environment 

 Example: 

 ``` 
 clone_job 464665 NOVIDEO=0 
 ```