Project

General

Profile

Actions

action #75310

closed

Entity in active state in local DB and actually already deleted in PC cause endless loop

Added by asmorodskyi over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
2020-10-26
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

If entity in PCW is in ACTIVE state while been deleted in PC this leads to endless loop of exceptions.
Looks like code ( at least for EC2 ) not expecting situation that entity might disappear after been written down to our DB
so we keep throwing exceptions over and over again

Actions #1

Updated by asmorodskyi over 3 years ago

  • Status changed from New to In Progress
  • Assignee set to asmorodskyi
Actions #2

Updated by asmorodskyi over 3 years ago

  1. PCW log into it's internal DB that there 4 entities exists
  2. Pavel Dostál delete them manually
  3. There was not correct handling of situation that someone outside PCW will delete them so it start throwing exceptions and constantly trying to delete them 🙂
Actions #3

Updated by asmorodskyi over 3 years ago

https://github.com/cfconrad/pcw/pull/91 - just to fire fight the problem .
things needs to be done :

  • cover this function with tests
  • cover same scenario for Azure and GCE
Actions #4

Updated by asmorodskyi over 3 years ago

unfortunately first attempt to fix issue failed ( spam remains ) . here is second one
https://github.com/cfconrad/pcw/pull/93

Actions #5

Updated by cfconrad over 3 years ago

Looks like code ( at least for EC2 ) not expecting situation that entity might disappear after been written down to our DB

The code which handle such situation is in ocw.lib.db.sync_csp_to_local_db().
The code works like

  • Set all db entries for such provider/namespace to active=false
  • Get all instance from provider
  • Set only gotten db-entries to active=true where such instance was found.

I would say that the error appear, cause we do not handle exceptions during deletion correctly in ocw.lib.db.delete_instance() https://github.com/cfconrad/pcw/blob/13c33393882e898089daa6ed678e29033c8f07a6/ocw/lib/db.py#L236 and so we do not update the db.

I wonder why this error doesn't heal out on the next sync!!

And that situation cannot be avoided with our current design (and I do not have a design in my mind, where it could :)
Because we always get this race, that a instance can be deleted, just before we press the delete button.

Actions #6

Updated by asmorodskyi over 3 years ago

  • Status changed from In Progress to Resolved

second attempt successfully fix issue

Actions

Also available in: Atom PDF