Multi-brand new/used car inventory search: an evolution story of python script to a web scraper to an API driven webapp

I haven’t used swagger before. The error message I get is:

TypeError: Failed to execute 'fetch' on 'Window': Request with GET/HEAD method cannot have body.

Normally GET requests have their arguments as parameters. Does changing the API to a POST API fix swagger?

The curl commands generated by swagger work fine in either case. What database of dealers is being used?

Swagger doesn’t seem to work for invoking methods with parameters from request body, but its there just for the API schema. Use Postman, or plain CURL for invoking.

Agree, normally I would do it that way, but there is relatively complex type I need as parameter, so I decided to implement it this way.

There is no database of dealers. It’s scraping tool that hits live dealer websites for information. Dealer info is stored in config file. Any dealer which website compatible with one of 3 implementations, could be added to the config

Yeah, that is true for GET requests. It sounds like it can do it for POST requests.

There is no database of dealers. It’s scraping tool that hits live dealer websites for information. Dealer info is stored in config file. Any dealer which website compatible with one of 3 implementations, could be added to the config

Yeah, I meant which dealer sites are in the config file for the API?

1 Like

Async REST services are here! :drum: :drum: :drum:

This is how to consume:

You call service to start search, get back result key and endpoint information, where results will be available when ready, and then you start polling result endpoint periodically, until results are ready.

Details:

  1. We now have 2 operations:
    image

  2. First you invoke “StartSearch” operation. It will synchronously return “202 Accepted” status with ticket information, while starting actual search asynchronously in background.


    You will get back searchKey for results retrieval, RetryAfter to tell you how often you should poll result endpoint, and result endpoint Uri.

  3. You start periodically polling (but not quicker than retryAfter suggests) results endpoint, supplying searchKey as parameter. If results aren’t ready yet, you will get “202 Accepted” code with payload indicating that search is still in progress

  4. Keep polling that endpoint until you get search results. You either get payload with success status and array of results, or with failure status and error message. In both cases it’ll be 200 OK code.

  5. Use the results to display in your favorite UI using your favorite UI framework.

Please feel free to test it out, and report any problems that you find. I’ll push latest code to Git shortly.

Edit: changed search service to POST, better compatibility with clients built on JS frameworks.

3 Likes

Cool. Is this implemented with some type of async Azure primitive (functions)?

No, nothing Azure specific in implementation, just ASP.net Web API controller using .NETCore 3.1.

Async comes from usage pattern, just without redirecting to additional resource URIs.

Check out my new fancy fast UI for working with async REST services pattern, written in JQuery entirely client-side :fire: :fire: :fire: :fire:

  • still totally real-time searches, no stale car data here lol
  • searches will never time out, no matter how long it takes to scrape all dealers for info
  • client-side paging, sorting and insanely useful FILTERING on all results that you get - lighting fast search on all columns in the grid, e.g. putting “xdr blue FWD” will filter down all results to only xDrive FWD cars in blue. Try it on all columns, VIN, stock, engine etc.
  • real-time search statistics showing how search is initiated, how it polls for results, and when it finishes
  • fancy spinning indicator, always my proudest accomplishment in any UI work, yeah baby!

It’s live now in place of old UI:

http://carscrapper.azurewebsites.net/

Give it a spin and let me know.

Edit: just noticed dealerCom BMW searches return unrelated models along with model that is searched on. I’ll look into it

9 Likes

not sure if all but some dealerinspire website has a xml site-map like this one
https://www.bmwofbloomington.com/dealer-inspire-inventory/inventory_sitemap
and you can simply parse the url to get year/model/vin and make another async request to get details if you want

Cool I’ll check it out. Thanks

Edit: just fixed BMW search. What’s interesting is that 2 existing dealercom dealers changed their websites:

Long beach BMW went from

https://www.longbeachbmw.com/new-bmw/long-beach.htm?superModel={0}

to

https://www.longbeachbmw.com/new-bmw/long-beach.htm?model={0}

Sterling BMW went from standard dealerCom URL structure

www.sterlingbmw.com/new-inventory/index.htm?model={0}

to custom URL

www.sterlingbmw.com/new-inventory/pageSizeChange/1/10/~/VehicleType_~Model_{0}~Trim_~Year_~Price1_~TransmissionGeneric_~ExteriorColorGeneric_~InteriorColor_~EPAHighway_/~/100

On plus side, sterling new URL can set page size via query parameters, which removes the need to programmatically determine paging (which is not that reliable) and crawl each page to collect all stock.

1 Like

I noticed one of my local MB sites did something similar. Seemed to be with the last update in their inventory and they moved couple other links around.

Yup, this is unfortunate if dealers do it often enough. With this change, which was broken filter at dealer’s, it was easy enough to spot, by seeing not relevant to search results. But if URL changes drastically, some sites might start erroring out and stop providing results completely. Those would be hard to spot. Gotta periodically check logs to catch offenders like that.

Added bunch of tri-state Hyundai dealers, 90% of those are plug and play DealerCom sites, excellent uniformity. Search for Sonata for example is sitting around 600 units.

1 Like

Few updates:

  • added dynamic hiding of empty grid columns to preserve space
  • added detection of packages to dealercom listings. Now there is an option to filter results down to specific package.

    Since I’m scraping this info from dealer’s search page, not all packages might be there. You can click individual listing link to see all packages, if they are also listed on dealer details page.

I’m going on vacation tomorrow, so most likely there will be no updates for next week. Have fun with search everyone and let me know what else can we add to it.

1 Like

Enjoy vacation. I am going on vacation this week too. But ironically, that means more hacking time for me, not less :grinning:

Thanks and enjoy yours too. Honestly, if it’d be up to me, my vacation would be sitting home coding/sleeping/gaming… :sob:

Update, probably last for the next week or so:

added 30 tri-state and PA Audi dealers for all your east coast audi searching needs :laughing:

Line 3: Audi Allentown
Line 4: Audi Bridgewater
Line 5: Audi Brooklyn
Line 6: Audi Devon
Line 7: Audi Eatontown
Line 8: Audi Fort Washington
Line 9: Audi Freehold
Line 10: Audi Greenwich
Line 11: Audi Hawthorne
Line 12: Audi Hunt Valley
Line 13: Audi Mechanicsburg
Line 14: Audi Mendham
Line 15: Audi Newton
Line 16: Audi of Huntington
Line 17: Audi Princeton
Line 18: Audi Reading
Line 19: Audi State College
Line 20: Audi Warrington
Line 21: Audi West Chester
Line 22: Audi Wynnewood
Line 23: Audi Wyoming Valley
Line 26: Bell Audi
Line 31: Biener Audi
Line 68: DCH Millburn Audi
Line 70: Fiore Audi
Line 71: Flemington Audi
Line 87: Jack Daniels Audi of Paramus
Line 88: Jack Daniels Audi of Upper Saddle River
Line 96: Mohegan Lake Audi
Line 116: Town Audi
2 Likes

A post was merged into an existing topic: Off Topic Landfill

Back from vacation, back to the grind. Added ability to search by specific region, or all listings nationally, if desired:

image

Going forward each dealer that is added to the config needs his region to be specified at that time.

1 Like