Automatically extracting RV data

I saw in another post that Ally publishes all of its RV data, including historical RV data. I didn’t know about this! Of course, Ally’s RV are probably very different from captives. But I still think there is probably a lot of useful information in there:

  • How much do RVs fluctuate historically?
  • Can we identify or predict trends in RVs based on prior months?
  • Are there any obnoxiously high RVs right now? (Cough Tacoma cough)

The data is in PDFs. For example, here is the current version. Edit: Apparently Ally really does not want you to link directly to anything on their site. Go to Ally dealer services, click Tools (upper right side of screen), and then Residual Value Lease Guide (RVLG).

I converted this with pdftotext and got something like the following, which looks annoying but probably parseable:

New Vehicle Residual Values
Model
Year
Make
2019
VOLVO

Model

Description

12

24 27 30 36 39 42 48 60

4dr Sdn T5 Momentum (860136 &105)

48

48 47 45 43 41 40 36 28

4dr Sdn T5 R-Design (860136 &111)

48

48 47 45 43 41 40 36 29

I suspect there are better tools for extracting data from PDFs. Any ideas?

1 Like
  1. Well if we’re talking over multiple model years in ‘17 they had crazy high 24 month Mercedes RVs. Iirc Micheal scored a crazy e class deal, I wasn’t here during that time.

2, :crystal_ball:, this is like trying to predict programs, @RVguy would be the guy to talk to.

My robotics club did something similar using google cloud in terms of pdf to text. It’s free for the first 1k images, does azure have something similar.

Python-pdfminer seems a lot more powerful. I’ve been playing with the included pdf2txt to dump HTML, but I haven’t been able to get it to group elements of each row together, presumably since they are so far apart. But I suspect that is possible, and possibly easy from the python API.

Cool, which check it out Google’s cloud thing. There seems to be a ton of cloud offerings.

Passing -L 5000 seemed to convince pdf2txt to put everything on the same line. I’ll play with that some more tonight or over lunch.

This might be extremely interesting, but the issue is that the data collection is going to be very time consuming if you want it to reflect RV for all banks.

Other than the Edmunds forums, which can help you a few zip codes and models at a time at most, it is going to be very difficult to collect the data needed to make these calculations for captive lenders such as BMWFS and VCFS.

1 Like

Agreed completely. My plan is just to analyze the Ally data since it’s there and see if we can do anything useful with it.

I’d love captive data, but that’s more complex, of course. Maybe we can use the Edmunds Deals? I also just saw that this morning. @RustyDaemon want to start scraping that too? :laughing:

I tried edmunds deals via that link, but could not see MF info… I’ll look at it later

It’s hard to find at first but I was able to get it to work using the instructions in the post. Let me know if you keep having trouble and I can take some more screen shots.

Due to my job, I’ve got all the historical data on every lender in most regions. RVs, rates, residuals and all applicable rebates. Going back about 6 yrs.

I have tried forecasting each component on every lender and found that there is nothing predictive within the Ally RVs themselves. Their adjustments are all done relative to ALG’s RVs and those adjustments (usually in the +5 to +8 range) occasionally change in tandem with their rate changes or if they push into a new brand.

Plus there are very few models currently where Ally has the best program.

US Bank is almost always -1 or -2 relative to ALG. But their rates are changing more frequently at the model-level to feather volume.

There is a fairly cheap desking tool called LeaseScan that dealers pay about $1k/mo that may give you access to all the programs around the country. I’m sure they would license it to a broker. From the screen shots and a YouTube video with a demo of it back in 2011, it’s pretty retro and I doubt much has changed.

7 Likes

How do you remember that correctly if you weren’t here during that time? :thinking:

1 Like

I read a lot :wink:. How else would I have 10d of read time :laughing:.

I go back in posts, on articles, etc. There’s not much to do, got to keep myself entertained somehow.

You know there are these things called books too :slight_smile:

I’d go even further and say “girls”

5 Likes

You read girls?

Haven’t had much time to work on this, but finally got pdfminer to do what I want.

Here’s a Google CoLab notebook where you can see the data in an almost structured format.

You can also modify it via the web interface very easily if you want to work on it.

Maybe a stupid question but Ive always asked myself, if edmunds forums has RV for vehicles wouldn’t it be much easier for them to have that data as a drop down / form where folks could look it up themselves? All they would have to do is update it monthly. It seems like a better approach than answering questions on RV to folks on the forums.

What am I missing?

Licensing agreements that likely prevent them from openly publishing values like that.

3 Likes

From what I’ve gathered, the key to extracting and maintaining RV/MF data, via scraping or whatever method does the trick, is never posting that data in one place for anyone to see…allowing only for it to be accessed via request or as part of a larger output (a calculator, for example)

Understood. It is what it is I guess… but doesn’t seem like a fun job at edmunds lol.

1 Like