corporate name improvements. #16
Reference in New Issue
Block a user
No description provided.
Delete Branch "biz-name-jb-8"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
#8
The code itself looks good! And especially handy for if/when we get more matches, we can just update the dict or even put it into a separate file. Makes me want to add a task for making a bare bones way to do testing.
For now if you want to test the code by hand, if you run
gre-llc.pyin the same folder you should be able to test the results. The desired result would be that any pair of (name from the KC assessor data, name from the Sec of State data) that are matched when standardized, shows up in the "exact_match" list. LMK if that doesn't make sense, either way I should probably write out a better test script.@linnealovespie : I am running
gre-llc.pyand the call atHmm no that's not expected behavior. Sometimes CCFS just moves around API access. However I dug around in the code for the connection string to the database where we have a data dump of the CCFS database, and it would be better to just use our own connection string instead.
I can work on refactoring to query our own copy of CCFS data ourselves now, or you can do it if you get to it first and I can give you the db connection string.
WIP: draft corporate name improvements.to draft corporate name improvements.@ -109,0 +118,4 @@term = term.replace(",", "")word_replace_map = {This works well for now, but I'm wondering if in the future we should do something like keep a separate JSON file for this map if/when we find more pairings and this list gets super long.
@ -109,0 +142,4 @@}for k,v in word_replace_map.items():term = term.replace(" " + k + " ", " " + v + " ")Is there a reason for the spaces in this replace call? There should have been a string strip earlier in the stack, meaning that any time one of these keys shows up at the end the replace won't match anything with no space after the last work. eg. if term="LA APTS", it won't standardize to "LA APARTMENTS".
In line 111, it adds spaces to the beginning and end of each term, so "LA APTS" becomes " LA APTS ". I did this to have a cleaner way to find " apt " that wouldn't catch "captain", as the regexp-y way to match words in the bay area code felt more complicated than needed. The padded spaces then get stripped out after any replacements. Not saying this is the best way! But the spaces were intentional.
Oh I see! Makes sense then.
draft corporate name improvements.to corporate name improvements.