Implement corporation standardization #8
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
eg. standardize LLC, LLP, L.L.C, and similar names to a single string.
Bare-bones implementation can be found at processors/corp_owners.py is_exact_match but bay area pipeline code has much more robust implementation
Here is direct link to the the bare-bones implementation:
https://git.coopcloud.tech/linnealovespie/aemp-seattle/src/branch/main/processors/corp_owners.py#L174-L199
Here is relevant code from the bay area:
https://github.com/antievictionmappingproject/eb-data-pipeline/blob/mainline/processors/utilities/sf/transform_helpers.py#L382-L443
(This is the sf file, but the oakland version is the same for this part.)
It would very quick and easy to expand out the replacements map. I think an aliasing feature might be ideal but unnecessary.
I looked into how this is done in LittleSis. In that code base, there is an aliasing feature so multiple names for an organization can be stored.
Here is some references in the LittleSis code to lists of business-related suffixes, but it's not used to substitute (other than for capitalization) as the variations are stored as aliases.
https://github.com/public-accountability/littlesis-rails/blob/main/app/utility/language.rb#L41
https://github.com/public-accountability/littlesis-rails/blob/main/app/utility/org_name.rb#L14
https://github.com/public-accountability/littlesis-rails/blob/main/app/models/org_names.rb#L35