Multi-brand new/used car inventory search: an evolution story of python script to a web scraper to an API driven webapp

As I was looking for an X3 recently, I wrote this python script to scrape dealer websites and find available inventory. This ended up saving me some time clicking the browser. Given that, at the end of the day, this is a forum for hackrs and I’ve found a lot of super valuable information here, I’m making this available in case anyone would find it useful:

@RustyDaemon, 07/18/2022 - latest version for this evolved beast. Nationwide search, 40+ brands, 500+ dealers/dealer groups, new/used listings.
http://carscrapper.azurewebsites.net/

56 Likes

Now this is some serious hacking, I like it!

4 Likes

That’s awesome.! My hat is off to you on being a coder and being able to figure out Python. In Accounting school we had a Python project and let’s just say I was glad it was a group project.

2 Likes

Now there is someone putting money where his mouth is, not like some users in “best deals list” thread

9 Likes

command to execute the script ?

command to execute the script ?

It’s all in the readme

Also, goes without saying, but contributions are welcome. I don’t expect to work on this in the coming days because (1) I need a break from this search (2) I need to spend time enjoying my new car and (3) Now that I have a car, I’m less incentivised to improve it unless people want to use it and need my help.

4 Likes

This is great, good work. i wonder how more accurate it is than going through Autotrader etc.

I really depends on where Autotrader is pulling data from. I ended up going to the dealer websites as I consider this to be closest to “ground truth” I can get off of public sources.

At first I scraped the inventory at bmwusa.com which is supposed do be across dealers. It’s not hard to hit their API with a set of zip codes and 100 miles radius. However, after doing that for a few days I noticed that it was lagging what dealers had on their websites. This is an example of why it’s important to get data as close to the source as possible.

9 Likes

Exactly. I noticed that not all of my Volvo dealer’s cars are on cargurus. And Volvo loaners are missing most of the time, while being listed on dealers’ sites.

Nice work. Couple thoughts on how you can skip the manual dealer website entry - You can take input from a user for their zipcode, miles of radius they want to cover. Use that to hit Google maps api with keyword search for bmw dealers. Use this json response list to extract their website links.

This avoids scraping bmw usa and manually copy/pasting a dealers URL. Your application will be out of date pretty quick if you have to manually maintain dealer URLs. Another nifty upgrade could be taking input from user for which car they are interested in. From the URLs you have in the code there’s a pattern to how to direct to specific model pages, you can use that to your advantage.

Not trying to nitpick :slight_smile: My full time job is writing and reviewing code at big N company so just wanted to throw out some suggestions.

6 Likes

Sounds like you are in a good spot to send a pull request with some enhancements.

I’m out but skimmed the code and @coder8 what a great MVP! :clap:t2::clap:t2: Thanks for sharing. I’m going to build this over the weekend and reserve other comments/contributions until then.

Edit: adjacent but I saw this today and so feel it, my favorite version of this and :100: checks out

https://twitter.com/oliagavrysh/status/1276767355515727873?s=21

3 Likes

Thanks for the suggestions!

Use that to hit Google maps api with keyword search for bmw dealers.

I’d not go to the Google maps API as free-text search might introduce more harm than good (haven’t tested this, but I bet it’s going to show me service centers, non-dealers etc). In fact, bmwusa.com has a REST API for that and it works pretty well and gives you a rich json result.

curl -XGET 'https://www.bmwusa.com/api/dealers/02140/100'

Then it’s just a matter of trial-and-error to identify which of the three dealer platforms we’re dealing with. Happy to take a look at your PR :slight_smile:

Another nifty upgrade could be taking input from user for which car they are interested in.

Totally! This can be easily parametrized in the code as-is. But ideally this script is just a lambda that runs continuously on different models (or even makes) and indexes the results somewhere. Then if I can pull off some async VIN-decoding, I can have a super-advanced search features - this is what I needed personally, because none of the websites would let me say something like “I want a sapphire, mineral white or blue X3 with canberra or cognac leather and executive + driving assist plus wihin X miles from where I live”.

4 Likes

Nice! I automated my loaner hunt via java but it doesn’t work 1/4 as well as this!

Decoder wise, the best one I can think of is mdecoder. It’ll be possible to scrape the vin, and then pass it to a UI output that links to mdecoder. You’d only need to pass the vin.

https://www.mdecoder.com/decode/wbaug51010pv27851

1 Like

I don’t see any actual fires being put out, or people burning in the background.

Source: unicorn office visits

At 99.9% of companies of any size, the actual push looks just like this (one-person stunt show), but there are oppositely-themed Cirque du Soleil performances going on in the background (e.g. fire dancers and water ballerinas) — and it’s award season so all of those performers want the spot light.

2 Likes

Decoder wise, the best one I can think of is mdecoder.

Unfortunately it’s got recaptcha… And mortals like me usually can’t get around that.

1 Like

Quite the opposite actually. I love what I do but in order to keep loving it I need time away from it, which means no code contributions on weekends period.

Good point. I just tried it myself and you would have to implement filtering to get what you want. Not worth it especially when you found BMWs own API.

If your plan is host this someday this would get really expensive really fast. I would advise going for pull based model so your compute costs stay minimal.

Agree that nlp searching would add great value. Excited to see this take shape, good luck! :slight_smile:

2 Likes

I actually accept the challenge to fit all of this inside the AWS free tier. Reminds me of this nice article.

4 Likes

You beat me to the punch on this as well… Nice work! Sending you a PM shortly. Definitely have some questions for you.

That article has a missing white space that won’t pass the linter. Just kidding of course. Very impressive with everyone’s technical knowledge here.