We had a car data collected from website. It was an advertisement website for used cars.
Car data had below properties regarding damage.
1)If there is a big damage or any damage that insurance company knows people
say it is damaged.
2)If it is a small thing,if one can make up, or already painted that are and thinks
no one understand , he does not say it is damaged.
What we have is then
Car model
Year
Price
City
Date of publishing
Last update of advertisement
Days elapsed from publishing(if a car is sold it goes from list)
Elapsed Days for selling
Think there is no data as clue. We must generate , extract, invent our data.
a)So lets think how a damaged car owner thinks
b)What changes in advertisement over time if car has small damage.(car owner is
editing advertisement over time)
1)Number of page view
There is a mean number of average page view before car owner deletes advertisement.
Lets say a no damaged car is being sold after nearly 100 page views. If a car is advertised
as not damaged and still not sold after 100 page views it can have a problem.
2)Number of change in price
At 1st owner thinks he can sell his car with a price like non-damaged cars .After a period he makes some discounts.
Probably after some calls he realizes he has to make discount. So we can generate 2 variables from here
% discount he made from 1st price
# of discounts he made.
3)Is price lower than average with same conditions.
A sense of guilt could be determined.
4)Duration that it is on sale
Total duration car is on sale.
5)Difference of duration in days from duration average sales of same car model.
There is an average duration for every combination of cars. So elapsed day
after average duration(day or week) will increase the probability of damage.
6)Number of pictures in advertisement
Probably a damaged car owner will put no picture or 1-2 pictures.Less picture could mean more probability of damage.
No comments:
Post a Comment