Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

projected and target_* values for steel output of target_sda() seems off #160

Closed
jdhoffa opened this issue Aug 4, 2020 · 4 comments · Fixed by #164, #167 or #169
Closed

projected and target_* values for steel output of target_sda() seems off #160

jdhoffa opened this issue Aug 4, 2020 · 4 comments · Fixed by #164, #167 or #169
Assignees

Comments

@jdhoffa
Copy link
Member

jdhoffa commented Aug 4, 2020

From a conversation with a beta-tester:

Something seems to be off with the steel projected and target output (very high)

sector year emission_factor_metric emission_factor_value
steel 2019 projected 4.872001912
steel 2019 corporate_economy 1.867306771
steel 2019 target_b2ds 4.872001912
steel 2019 adjusted_scenario_b2ds 1.867306771

I checked the CO2 intensity values of matched companies and they are all less than 4:

year min_value max_value
2019 0.0236 3.02
2020 0.0236 3.02
2021 0.0236 3.02
2022 0.0236 3.02
2023  0.0236 3.02
2024  0.0236  3.02
2025  0.0236  3.02

I am unable to reproduce this error as of now, but will try again and post here as soon as I have.

@jdhoffa jdhoffa self-assigned this Aug 4, 2020
@jdhoffa
Copy link
Member Author

jdhoffa commented Aug 5, 2020

Hey @QianFeng2020 so I have been unable to reproduce this error locally, using the latest dev version of the code. See the following reprex, where I have tired a few of the possibilities we had discussed over the call:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(r2dii.data)
library(r2dii.match)
library(r2dii.analysis)

# here I have recreated a fake match_result with crucial columns note how
# id_loan is repeated three times, similar to what we had discussed also note
# how there are 3 very small loans to a "small company" and one large loan to a
# "large company"
match_result <- tibble::tribble(
  ~id_loan, ~loan_size_outstanding, ~loan_size_credit_limit, ~id_2dii,            ~level, ~score,  ~sector,       ~name_ald, ~sector_ald,
  1,                      1,                       2,    "UP1", "ultimate_parent",      1, "cement", "small company",    "cement",
  1,                      1,                       2,    "UP1", "ultimate_parent",      1, "cement", "small company",    "cement",
  1,                      1,                       2,    "UP1", "ultimate_parent",      1, "cement", "small company",    "cement",
  2,                  1e+06,                       2,    "UP1", "ultimate_parent",      1, "cement", "large company",    "cement"
)

#show intersting values
match_result %>% 
  select(id_loan, loan_size_outstanding, name_ald, sector_ald) %>% 
  head()
#> # A tibble: 4 x 4
#>   id_loan loan_size_outstanding name_ald      sector_ald
#>     <dbl>                 <dbl> <chr>         <chr>     
#> 1       1                     1 small company cement    
#> 2       1                     1 small company cement    
#> 3       1                     1 small company cement    
#> 4       2               1000000 large company cement

# here we have the two matched companies, as well as one unmatched company
ald <- tibble::tribble(
  ~name_company,  ~sector, ~technology, ~year, ~production, ~emission_factor, ~plant_location, ~is_ultimate_owner,
  "small company", "cement",  "bof shop",  2020,           1,           0.0236,            "DE",               TRUE,
  "large company", "cement",  "bof shop",  2020,           1,             3.02,            "DE",               TRUE,
  "unmatched company", "cement",  "bof shop",  2020,           1,           2.5574,            "DE",               TRUE,
  "small company", "cement",  "bof shop",  2025,           1,           0.0236,            "DE",               TRUE,
  "large company", "cement",  "bof shop",  2025,           1,             3.02,            "DE",               TRUE,
  "unmatched company", "cement",  "bof shop",  2025,           1,           2.5574,            "DE",               TRUE
)

ald %>% 
  select(name_company, sector, emission_factor) %>% 
  head(3)
#> # A tibble: 3 x 3
#>   name_company      sector emission_factor
#>   <chr>             <chr>            <dbl>
#> 1 small company     cement          0.0236
#> 2 large company     cement          3.02  
#> 3 unmatched company cement          2.56

# and a scenario with totally arbitrary values
scen <- tibble::tribble(
  ~scenario,  ~sector,  ~region, ~year, ~emission_factor,           ~emission_factor_unit, ~scenario_source,
  "b2ds", "cement", "global",  2020,             4.87, "tons of CO2 per ton of cement",      "demo_2020",
  "b2ds", "cement", "global",  2025,              1.5, "tons of CO2 per ton of cement",      "demo_2020"
)

scen %>% 
  select(scenario, sector, year, emission_factor)
#> # A tibble: 2 x 4
#>   scenario sector  year emission_factor
#>   <chr>    <chr>  <dbl>           <dbl>
#> 1 b2ds     cement  2020            4.87
#> 2 b2ds     cement  2025            1.5

# the output of target_sda looks good, with the upper bound dictated by the 
# max(emission_factor) of matched companies
target_sda(match_result, 
           ald, 
           scen) %>% 
  filter(year == min(year))
#> # A tibble: 4 x 4
#>   sector  year emission_factor_metric emission_factor_value
#>   <chr>  <dbl> <chr>                                  <dbl>
#> 1 cement  2020 projected                               3.02
#> 2 cement  2020 corporate_economy                       1.87
#> 3 cement  2020 target_b2ds                             3.02
#> 4 cement  2020 adjusted_scenario_b2ds                  1.87

Created on 2020-08-05 by the reprex package (v0.3.0)

@jdhoffa
Copy link
Member Author

jdhoffa commented Aug 5, 2020

Thanks @QianFeng2020 I'm able to reproduce the error now. See the following reprex. It does indeed seem to be an issue with duplicated id_loan:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(r2dii.data)
library(r2dii.match)
library(r2dii.analysis)

match_result <- tibble::tribble(
  ~id_loan, ~loan_size_outstanding, ~loan_size_credit_limit, ~id_2dii,            ~level, ~score,      ~sector,       ~name_ald, ~sector_ald,
  1,                      1,                       2,    "UP1", "ultimate_parent",      1, "automotive", "large company",    "cement",
  1,                      1,                       2,    "UP1", "ultimate_parent",      1, "automotive", "large company",    "cement"
)

ald <- tibble::tribble(
  ~name_company,  ~sector, ~technology, ~year, ~production, ~emission_factor, ~plant_location, ~is_ultimate_owner,
  "large company", "cement",       "ice",  2020,           1,                2,            "BF",               TRUE,
  "large company", "cement",       "ice",  2025,           1,                2,            "BF",               TRUE
)


scen <- tibble::tribble(
  ~scenario,  ~sector,  ~region, ~year, ~emission_factor,           ~emission_factor_unit, ~scenario_source,
  "b2ds", "cement", "global",  2020,                1, "tons of CO2 per ton of cement",      "demo_2020",
  "b2ds", "cement", "global",  2025,              0.5, "tons of CO2 per ton of cement",      "demo_2020"
)

target_sda(match_result,
           ald,
           scen) %>%
  filter(year == min(year))
#> # A tibble: 4 x 4
#>   sector  year emission_factor_metric emission_factor_value
#>   <chr>  <dbl> <chr>                                  <dbl>
#> 1 cement  2020 projected                                  4
#> 2 cement  2020 corporate_economy                          2
#> 3 cement  2020 target_b2ds                                4
#> 4 cement  2020 adjusted_scenario_b2ds                     2

Created on 2020-08-05 by the reprex package (v0.3.0)

Working on it now

@jdhoffa
Copy link
Member Author

jdhoffa commented Aug 5, 2020

Note with unique id_loan the projected emission_factor are calculated as expected:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(r2dii.data)
library(r2dii.match)
library(r2dii.analysis)

match_result <- tibble::tribble(
  ~id_loan, ~loan_size_outstanding, ~loan_size_credit_limit, ~id_2dii,            ~level, ~score,      ~sector,       ~name_ald, ~sector_ald,
  1,                      1,                       2,    "UP1", "ultimate_parent",      1, "automotive", "large company",    "cement",
  2,                      1,                       2,    "UP1", "ultimate_parent",      1, "automotive", "large company",    "cement"
)

ald <- tibble::tribble(
  ~name_company,  ~sector, ~technology, ~year, ~production, ~emission_factor, ~plant_location, ~is_ultimate_owner,
  "large company", "cement",       "ice",  2020,           1,                2,            "BF",               TRUE,
  "large company", "cement",       "ice",  2025,           1,                2,            "BF",               TRUE
)


scen <- tibble::tribble(
  ~scenario,  ~sector,  ~region, ~year, ~emission_factor,           ~emission_factor_unit, ~scenario_source,
  "b2ds", "cement", "global",  2020,                1, "tons of CO2 per ton of cement",      "demo_2020",
  "b2ds", "cement", "global",  2025,              0.5, "tons of CO2 per ton of cement",      "demo_2020"
)

target_sda(match_result,
           ald,
           scen) %>%
  filter(year == min(year))
#> # A tibble: 4 x 4
#>   sector  year emission_factor_metric emission_factor_value
#>   <chr>  <dbl> <chr>                                  <dbl>
#> 1 cement  2020 projected                                  2
#> 2 cement  2020 corporate_economy                          2
#> 3 cement  2020 target_b2ds                                2
#> 4 cement  2020 adjusted_scenario_b2ds                     2

Created on 2020-08-05 by the reprex package (v0.3.0)

@jdhoffa jdhoffa reopened this Aug 6, 2020
@jdhoffa
Copy link
Member Author

jdhoffa commented Aug 6, 2020

While the identified id_loan duplication was causing issues, it seems this bug goes deeper. In particular, if a sector has multiple different values of technology, the weighted_emission_factor output is too damn high:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(r2dii.data)
library(r2dii.match)
library(r2dii.analysis)

match_result <- tibble::tribble(
  ~id_loan, ~loan_size_outstanding, ~loan_size_credit_limit, ~id_2dii,            ~level, ~score,      ~sector,       ~name_ald, ~sector_ald,
  1,                      1,                       2,    "UP1", "ultimate_parent",      1, "automotive", "large company",    "cement",
  2,                      1,                       2,    "UP1", "ultimate_parent",      1, "automotive", "large company",    "cement"
)

ald <- tibble::tribble(
  ~name_company,  ~sector, ~technology, ~year, ~production, ~emission_factor, ~plant_location, ~is_ultimate_owner,
  "large company", "cement",       "abc",  2020,           1,                2,            "BF",               TRUE,
  "large company", "cement",       "def",  2020,           1,                2,            "BF",               TRUE,
  "large company", "cement",       "abc",  2025,           1,                2,            "BF",               TRUE,
  "large company", "cement",       "def",  2025,           1,                2,            "BF",               TRUE
)


scen <- tibble::tribble(
  ~scenario,  ~sector,  ~region, ~year, ~emission_factor,           ~emission_factor_unit, ~scenario_source,
  "b2ds", "cement", "global",  2020,                1, "tons of CO2 per ton of cement",      "demo_2020",
  "b2ds", "cement", "global",  2025,              0.5, "tons of CO2 per ton of cement",      "demo_2020"
)

target_sda(match_result,
           ald,
           scen) %>%
  filter(year == min(year))
#> # A tibble: 4 x 4
#>   sector  year emission_factor_metric emission_factor_value
#>   <chr>  <dbl> <chr>                                  <dbl>
#> 1 cement  2020 projected                                  4
#> 2 cement  2020 corporate_economy                          2
#> 3 cement  2020 target_b2ds                                4
#> 4 cement  2020 adjusted_scenario_b2ds                     2

Created on 2020-08-06 by the reprex package (v0.3.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant