corporate name improvements. #16

Merged
jessib merged 4 commits from biz-name-jb-8 into main 2026-02-14 01:47:39 +00:00
Collaborator

#8

#8
jessib added 1 commit 2025-12-30 04:27:23 +00:00

The code itself looks good! And especially handy for if/when we get more matches, we can just update the dict or even put it into a separate file. Makes me want to add a task for making a bare bones way to do testing.

The code itself looks good! And especially handy for if/when we get more matches, we can just update the dict or even put it into a separate file. Makes me want to add a task for making a bare bones way to do testing.

For now if you want to test the code by hand, if you run gre-llc.py in the same folder you should be able to test the results. The desired result would be that any pair of (name from the KC assessor data, name from the Sec of State data) that are matched when standardized, shows up in the "exact_match" list. LMK if that doesn't make sense, either way I should probably write out a better test script.

For now if you want to test the code by hand, if you run `gre-llc.py` in the same folder you should be able to test the results. The desired result would be that any pair of (name from the KC assessor data, name from the Sec of State data) that are matched when standardized, shows up in the "exact_match" list. LMK if that doesn't make sense, either way I should probably write out a better test script.
jessib added 1 commit 2026-01-06 05:19:48 +00:00
Author
Collaborator

@linnealovespie : I am running gre-llc.py and the call at

r = requests.post(search_for_business_url, get_business_search_payload(business_name, 100, page_num))
is returning a status of 400 but the text "System verification in progress, please wait." (I also see that if I go to https://ccfs-api.prod.sos.wa.gov/api/BusinessSearch/GetBusinessSearchList in a browser or with curl.) Before I dig into this more, I wanted to ask if there is a setup piece or something else I'm missing. I also could write tests or other way to look at my changes, but thought I'd ask first.

@linnealovespie : I am running `gre-llc.py` and the call at https://git.coopcloud.tech/linnealovespie/aemp-seattle/src/commit/8f250bbe4c3f2dba2c47b5bf7589d07283b34c2c/processors/corp_owners.py#L85 is returning a status of 400 but the text "System verification in progress, please wait." (I also see that if I go to https://ccfs-api.prod.sos.wa.gov/api/BusinessSearch/GetBusinessSearchList in a browser or with curl.) Before I dig into this more, I wanted to ask if there is a setup piece or something else I'm missing. I also could write tests or other way to look at my changes, but thought I'd ask first.
jessib added 1 commit 2026-01-07 00:35:08 +00:00

Hmm no that's not expected behavior. Sometimes CCFS just moves around API access. However I dug around in the code for the connection string to the database where we have a data dump of the CCFS database, and it would be better to just use our own connection string instead.

I can work on refactoring to query our own copy of CCFS data ourselves now, or you can do it if you get to it first and I can give you the db connection string.

Hmm no that's not expected behavior. Sometimes CCFS just moves around API access. However I dug around in the code for the connection string to the database where we have a data dump of the CCFS database, and it would be better to just use our own connection string instead. I can work on refactoring to query our own copy of CCFS data ourselves now, or you can do it if you get to it first and I can give you the db connection string.
jessib added 1 commit 2026-01-22 03:01:37 +00:00
# Conflicts:
#	processors/corp_owners.py
jessib changed title from WIP: draft corporate name improvements. to draft corporate name improvements. 2026-02-08 17:03:45 +00:00
linnealovespie reviewed 2026-02-11 22:19:35 +00:00
@ -109,0 +118,4 @@
term = term.replace(",", "")
word_replace_map = {

This works well for now, but I'm wondering if in the future we should do something like keep a separate JSON file for this map if/when we find more pairings and this list gets super long.

This works well for now, but I'm wondering if in the future we should do something like keep a separate JSON file for this map if/when we find more pairings and this list gets super long.
@ -109,0 +142,4 @@
}
for k,v in word_replace_map.items():
term = term.replace(" " + k + " ", " " + v + " ")

Is there a reason for the spaces in this replace call? There should have been a string strip earlier in the stack, meaning that any time one of these keys shows up at the end the replace won't match anything with no space after the last work. eg. if term="LA APTS", it won't standardize to "LA APARTMENTS".

Is there a reason for the spaces in this replace call? There should have been a string strip earlier in the stack, meaning that any time one of these keys shows up at the end the replace won't match anything with no space after the last work. eg. if term="LA APTS", it won't standardize to "LA APARTMENTS".
Author
Collaborator

In line 111, it adds spaces to the beginning and end of each term, so "LA APTS" becomes " LA APTS ". I did this to have a cleaner way to find " apt " that wouldn't catch "captain", as the regexp-y way to match words in the bay area code felt more complicated than needed. The padded spaces then get stripped out after any replacements. Not saying this is the best way! But the spaces were intentional.

In line 111, it adds spaces to the beginning and end of each term, so "LA APTS" becomes " LA APTS ". I did this to have a cleaner way to find " apt " that wouldn't catch "captain", as the regexp-y way to match words in the bay area code felt more complicated than needed. The padded spaces then get stripped out after any replacements. Not saying this is the best way! But the spaces were intentional.

Oh I see! Makes sense then.

Oh I see! Makes sense then.
linnealovespie marked this conversation as resolved
linnealovespie requested review from linnealovespie 2026-02-13 00:27:27 +00:00
linnealovespie approved these changes 2026-02-13 00:27:49 +00:00
jessib changed title from draft corporate name improvements. to corporate name improvements. 2026-02-14 01:47:08 +00:00
jessib merged commit d5e3d8e2a0 into main 2026-02-14 01:47:39 +00:00
jessib deleted branch biz-name-jb-8 2026-02-14 01:47:39 +00:00
Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: linnealovespie/aemp-seattle#16
No description provided.