Implement corporation standardization #8

Open
opened 2025-10-01 00:57:21 +00:00 by linnealovespie · 1 comment

eg. standardize LLC, LLP, L.L.C, and similar names to a single string.
Bare-bones implementation can be found at processors/corp_owners.py is_exact_match but bay area pipeline code has much more robust implementation

eg. standardize LLC, LLP, L.L.C, and similar names to a single string. Bare-bones implementation can be found at processors/corp_owners.py [is_exact_match](https://evictorbook.com/) but bay area pipeline code has much more robust implementation
linnealovespie added this to the (deleted) project 2025-10-01 00:57:21 +00:00
jessib was assigned by linnealovespie 2025-12-10 21:17:17 +00:00
linnealovespie moved this to To Do in Evictorbook Code on 2025-12-10 21:18:16 +00:00
linnealovespie added this to the Evictorbook Code project 2025-12-10 21:21:28 +00:00
linnealovespie moved this to Todo in Evictorbook Code on 2025-12-10 21:21:46 +00:00
Collaborator

Here is direct link to the the bare-bones implementation:
https://git.coopcloud.tech/linnealovespie/aemp-seattle/src/branch/main/processors/corp_owners.py#L174-L199

Here is relevant code from the bay area:
https://github.com/antievictionmappingproject/eb-data-pipeline/blob/mainline/processors/utilities/sf/transform_helpers.py#L382-L443
(This is the sf file, but the oakland version is the same for this part.)

It would very quick and easy to expand out the replacements map. I think an aliasing feature might be ideal but unnecessary.

I looked into how this is done in LittleSis. In that code base, there is an aliasing feature so multiple names for an organization can be stored.

Here is some references in the LittleSis code to lists of business-related suffixes, but it's not used to substitute (other than for capitalization) as the variations are stored as aliases.
https://github.com/public-accountability/littlesis-rails/blob/main/app/utility/language.rb#L41
https://github.com/public-accountability/littlesis-rails/blob/main/app/utility/org_name.rb#L14
https://github.com/public-accountability/littlesis-rails/blob/main/app/models/org_names.rb#L35

Here is direct link to the the bare-bones implementation: https://git.coopcloud.tech/linnealovespie/aemp-seattle/src/branch/main/processors/corp_owners.py#L174-L199 Here is relevant code from the bay area: https://github.com/antievictionmappingproject/eb-data-pipeline/blob/mainline/processors/utilities/sf/transform_helpers.py#L382-L443 (This is the sf file, but the oakland version is the same for this part.) It would very quick and easy to expand out the replacements map. I think an aliasing feature might be ideal but unnecessary. I looked into how this is done in LittleSis. In that code base, there is an aliasing feature so multiple names for an organization can be stored. Here is some references in the LittleSis code to lists of business-related suffixes, but it's not used to substitute (other than for capitalization) as the variations are stored as aliases. https://github.com/public-accountability/littlesis-rails/blob/main/app/utility/language.rb#L41 https://github.com/public-accountability/littlesis-rails/blob/main/app/utility/org_name.rb#L14 https://github.com/public-accountability/littlesis-rails/blob/main/app/models/org_names.rb#L35
Sign in to join this conversation.
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: linnealovespie/aemp-seattle#8
No description provided.