action #75310
closedEntity in active state in local DB and actually already deleted in PC cause endless loop
0%
Description
If entity in PCW is in ACTIVE state while been deleted in PC this leads to endless loop of exceptions.
Looks like code ( at least for EC2 ) not expecting situation that entity might disappear after been written down to our DB
so we keep throwing exceptions over and over again
Updated by asmorodskyi over 3 years ago
- Status changed from New to In Progress
- Assignee set to asmorodskyi
Updated by asmorodskyi over 3 years ago
- PCW log into it's internal DB that there 4 entities exists
- Pavel Dostál delete them manually
- There was not correct handling of situation that someone outside PCW will delete them so it start throwing exceptions and constantly trying to delete them 🙂
Updated by asmorodskyi over 3 years ago
https://github.com/cfconrad/pcw/pull/91 - just to fire fight the problem .
things needs to be done :
- cover this function with tests
- cover same scenario for Azure and GCE
Updated by asmorodskyi over 3 years ago
unfortunately first attempt to fix issue failed ( spam remains ) . here is second one
https://github.com/cfconrad/pcw/pull/93
Updated by cfconrad over 3 years ago
Looks like code ( at least for EC2 ) not expecting situation that entity might disappear after been written down to our DB
The code which handle such situation is in ocw.lib.db.sync_csp_to_local_db()
.
The code works like
- Set all db entries for such provider/namespace to
active=false
- Get all instance from provider
- Set only gotten db-entries to
active=true
where such instance was found.
I would say that the error appear, cause we do not handle exceptions during deletion correctly in ocw.lib.db.delete_instance()
https://github.com/cfconrad/pcw/blob/13c33393882e898089daa6ed678e29033c8f07a6/ocw/lib/db.py#L236 and so we do not update the db.
I wonder why this error doesn't heal out on the next sync!!
And that situation cannot be avoided with our current design (and I do not have a design in my mind, where it could :)
Because we always get this race, that a instance can be deleted, just before we press the delete button.
Updated by asmorodskyi over 3 years ago
- Status changed from In Progress to Resolved
second attempt successfully fix issue