As I was looking for an X3 recently, I wrote this python script to scrape dealer websites and find available inventory. This ended up saving me some time clicking the browser. Given that, at the end of the day, this is a forum for hackrs and I’ve found a lot of super valuable information here, I’m making this available in case anyone would find it useful:
That’s awesome.! My hat is off to you on being a coder and being able to figure out Python. In Accounting school we had a Python project and let’s just say I was glad it was a group project.
Also, goes without saying, but contributions are welcome. I don’t expect to work on this in the coming days because (1) I need a break from this search (2) I need to spend time enjoying my new car and (3) Now that I have a car, I’m less incentivised to improve it unless people want to use it and need my help.
I really depends on where Autotrader is pulling data from. I ended up going to the dealer websites as I consider this to be closest to “ground truth” I can get off of public sources.
At first I scraped the inventory at bmwusa.com which is supposed do be across dealers. It’s not hard to hit their API with a set of zip codes and 100 miles radius. However, after doing that for a few days I noticed that it was lagging what dealers had on their websites. This is an example of why it’s important to get data as close to the source as possible.
Exactly. I noticed that not all of my Volvo dealer’s cars are on cargurus. And Volvo loaners are missing most of the time, while being listed on dealers’ sites.
Nice work. Couple thoughts on how you can skip the manual dealer website entry - You can take input from a user for their zipcode, miles of radius they want to cover. Use that to hit Google maps api with keyword search for bmw dealers. Use this json response list to extract their website links.
This avoids scraping bmw usa and manually copy/pasting a dealers URL. Your application will be out of date pretty quick if you have to manually maintain dealer URLs. Another nifty upgrade could be taking input from user for which car they are interested in. From the URLs you have in the code there’s a pattern to how to direct to specific model pages, you can use that to your advantage.
Not trying to nitpick My full time job is writing and reviewing code at big N company so just wanted to throw out some suggestions.
Sounds like you are in a good spot to send a pull request with some enhancements.
I’m out but skimmed the code and @coder8 what a great MVP! Thanks for sharing. I’m going to build this over the weekend and reserve other comments/contributions until then.
Edit: adjacent but I saw this today and so feel it, my favorite version of this and checks out
Use that to hit Google maps api with keyword search for bmw dealers.
I’d not go to the Google maps API as free-text search might introduce more harm than good (haven’t tested this, but I bet it’s going to show me service centers, non-dealers etc). In fact, bmwusa.com has a REST API for that and it works pretty well and gives you a rich json result.
Then it’s just a matter of trial-and-error to identify which of the three dealer platforms we’re dealing with. Happy to take a look at your PR
Another nifty upgrade could be taking input from user for which car they are interested in.
Totally! This can be easily parametrized in the code as-is. But ideally this script is just a lambda that runs continuously on different models (or even makes) and indexes the results somewhere. Then if I can pull off some async VIN-decoding, I can have a super-advanced search features - this is what I needed personally, because none of the websites would let me say something like “I want a sapphire, mineral white or blue X3 with canberra or cognac leather and executive + driving assist plus wihin X miles from where I live”.
Nice! I automated my loaner hunt via java but it doesn’t work 1/4 as well as this!
Decoder wise, the best one I can think of is mdecoder. It’ll be possible to scrape the vin, and then pass it to a UI output that links to mdecoder. You’d only need to pass the vin.
At 99.9% of companies of any size, the actual push looks just like this (one-person stunt show), but there are oppositely-themed Cirque du Soleil performances going on in the background (e.g. fire dancers and water ballerinas) — and it’s award season so all of those performers want the spot light.
Quite the opposite actually. I love what I do but in order to keep loving it I need time away from it, which means no code contributions on weekends period.
Good point. I just tried it myself and you would have to implement filtering to get what you want. Not worth it especially when you found BMWs own API.
If your plan is host this someday this would get really expensive really fast. I would advise going for pull based model so your compute costs stay minimal.
Agree that nlp searching would add great value. Excited to see this take shape, good luck!