Author: Erin James Wills, [email protected]
Example showcasing how to incorporate time dependent data into a multiregression and an extra spreadsheet that illustrates one hot encoding of time data to obtain monthly estimates (see extra section and caveats about this process)
-
Dataset generation works well. The dataset should have a very high R2. Since the dataset is designed to include time series, by dedimensionalizing the time series the linear fit goes up a bit - from 0.99 to 0.9999.
-
The time series feature of the dataset is the outside temperature measurement. To make the the energy usage of the home dependent on the historical measurement of the temperature, the average temperature across three hours was used to calculate the heat flux of the house with a standard internal house temperature of 73oF. To account for these historical influences, three features representing the heat flux were created - 1) flux 2 hours before, 2) flux 1 hour before, and 3) the current heat flux of the current hour. Each feature was modeled as a historical trend for each record.
- Add randomness into the dataset features and test regression models.
- Add features that are dependent on other features (slight dependencies)
- Transform features by different methods and retest regression
- Use temperature instead of heat load
- Test effects of using Kelvin, Celsius, Fahrenheit
- Should a temperature reference point be used? This might not be known for all cases. I used 73oF aka 21.7oC as a common reference point. How can I get around this background knowledge?
- what if I bin square footage to be labels instead of values? What effect does this have on the accuracy.
- Add features
- Daylight/Night time
- Cloudiness
- Wind Speed
- Is seasonality needed?
- Added
monthly_predictions.xlsx
. I only added this as an example of one hot encoding. I don't really like this method of predicting in most cases becauses I doubt the likelihood the trend will continue based on the underlying assumptions. I added this into the repo because I often see industry specific training, that utilizes Excel, something similar to this spreadsheet. There are better ways using python (or even Excel).
- Excel
- Stats Toolkit Add-on