Automatically extracting RV data

  1. Well if we’re talking over multiple model years in ‘17 they had crazy high 24 month Mercedes RVs. Iirc Micheal scored a crazy e class deal, I wasn’t here during that time.

2, :crystal_ball:, this is like trying to predict programs, @RVguy would be the guy to talk to.

My robotics club did something similar using google cloud in terms of pdf to text. It’s free for the first 1k images, does azure have something similar.

Python-pdfminer seems a lot more powerful. I’ve been playing with the included pdf2txt to dump HTML, but I haven’t been able to get it to group elements of each row together, presumably since they are so far apart. But I suspect that is possible, and possibly easy from the python API.

Cool, which check it out Google’s cloud thing. There seems to be a ton of cloud offerings.

Passing -L 5000 seemed to convince pdf2txt to put everything on the same line. I’ll play with that some more tonight or over lunch.

This might be extremely interesting, but the issue is that the data collection is going to be very time consuming if you want it to reflect RV for all banks.

Other than the Edmunds forums, which can help you a few zip codes and models at a time at most, it is going to be very difficult to collect the data needed to make these calculations for captive lenders such as BMWFS and VCFS.

Agreed completely. My plan is just to analyze the Ally data since it’s there and see if we can do anything useful with it.

I’d love captive data, but that’s more complex, of course. Maybe we can use the Edmunds Deals? I also just saw that this morning. @RustyDaemon want to start scraping that too? :laughing:

I tried edmunds deals via that link, but could not see MF info… I’ll look at it later

It’s hard to find at first but I was able to get it to work using the instructions in the post. Let me know if you keep having trouble and I can take some more screen shots.

Due to my job, I’ve got all the historical data on every lender in most regions. RVs, rates, residuals and all applicable rebates. Going back about 6 yrs.

I have tried forecasting each component on every lender and found that there is nothing predictive within the Ally RVs themselves. Their adjustments are all done relative to ALG’s RVs and those adjustments (usually in the +5 to +8 range) occasionally change in tandem with their rate changes or if they push into a new brand.

Plus there are very few models currently where Ally has the best program.

US Bank is almost always -1 or -2 relative to ALG. But their rates are changing more frequently at the model-level to feather volume.

There is a fairly cheap desking tool called LeaseScan that dealers pay about $1k/mo that may give you access to all the programs around the country. I’m sure they would license it to a broker. From the screen shots and a YouTube video with a demo of it back in 2011, it’s pretty retro and I doubt much has changed.


Haven’t had much time to work on this, but finally got pdfminer to do what I want.

Here’s a Google CoLab notebook where you can see the data in an almost structured format.

You can also modify it via the web interface very easily if you want to work on it.

Maybe a stupid question but Ive always asked myself, if edmunds forums has RV for vehicles wouldn’t it be much easier for them to have that data as a drop down / form where folks could look it up themselves? All they would have to do is update it monthly. It seems like a better approach than answering questions on RV to folks on the forums.

What am I missing?

Licensing agreements that likely prevent them from openly publishing values like that.


From what I’ve gathered, the key to extracting and maintaining RV/MF data, via scraping or whatever method does the trick, is never posting that data in one place for anyone to see…allowing only for it to be accessed via request or as part of a larger output (a calculator, for example)

Understood. It is what it is I guess… but doesn’t seem like a fun job at edmunds lol.

I feel really bad for that Michael guy