class: center, middle, inverse, title-slide # Wish I'd known:
Project workflow and
reproducible science ### Bai Li
National Stock Assessment Modeling Team
Contractor with ECS in support of NOAA Fisheries
Office of Science and Technology
bai.li@noaa.gov
### April 21, 2021
Updated on 2021-04-21
University of Maine --- layout: true .footnote[U.S. Department of Commerce | National Oceanic and Atmospheric Administration | National Marine Fisheries Service] --- # About me - Education <table class="table" style="font-size: 18px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Degree </th> <th style="text-align:right;"> Year </th> <th style="text-align:left;"> Major </th> <th style="text-align:left;"> University </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Ph.D. </td> <td style="text-align:right;"> 2018 </td> <td style="text-align:left;"> Marine Biology </td> <td style="text-align:left;"> University of Maine, Orono, ME, USA </td> </tr> <tr> <td style="text-align:left;"> B.S. </td> <td style="text-align:right;"> 2013 </td> <td style="text-align:left;"> Marine Biology </td> <td style="text-align:left;"> University of Maine, Orono, ME, USA </td> </tr> <tr> <td style="text-align:left;"> B.S. </td> <td style="text-align:right;"> 2012 </td> <td style="text-align:left;"> Marine Resources </td> <td style="text-align:left;"> Shanghai Ocean University, Shanghai, China </td> </tr> </tbody> </table> <br> - Positions <table class="table" style="font-size: 18px; width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Year </th> <th style="text-align:left;"> Position </th> <th style="text-align:left;"> Affliation </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 2020-Present </td> <td style="text-align:left;"> External Graduate Faculty </td> <td style="text-align:left;"> School of Marine Sciences, University of Maine </td> </tr> <tr> <td style="text-align:left;"> 2019-2020 </td> <td style="text-align:left;"> Postdoc </td> <td style="text-align:left;"> Research Associateship, National Research Council </td> </tr> <tr> <td style="text-align:left;"> 2018-2019 </td> <td style="text-align:left;"> Postdoc </td> <td style="text-align:left;"> School of Marine Sciences, Univeristy of Maine </td> </tr> </tbody> </table> --- # Presentation Overview - Projects + Age-structured stock assessment package comparison + Fisheries Integrated Modeling System (FIMS) + R interface to the Metapopulation Assessment System (r4MAS) <br> - Project workflow and programming resources + Clear project structure and workflow + Version control system + GitHub Actions --- # Age-structured stock assessment package comparison .pull-left[ <img src="static/2020_model_comparison_project/dichmont_us_fisheries_councils.jpg" width="3333" style="display: block; margin: auto auto auto 0;" /> *<sup>1</sup>Dichmont et al., 2016* ] .footnote[ [1] https://doi.org/10.1016/j.fishres.2016.07.001 ] .pull-right[ - Packages - Assessment Model for Alaska (AMAK) - Age-Structured Assessment Program (ASAP) - Beaufort Assessment Model (BAM) - Stock Synthesis (SS) ] --- # Age-structured stock assessment package comparison
--- # Age-structured stock assessment package comparison  *<sup>1</sup>Deroba et al., 2015* .footnote[ [1] https://doi.org/10.1093/icesjms/fst237 ] ??? Assumptions and configuration choices --- # Age-structured stock assessment package comparison - Feature comparison and common requirements (Part I) <table class="table" style="font-size: 14px; width: auto !important; float: left; margin-right: 10px;"> <thead> <tr> <th style="text-align:left;"> Feature </th> <th style="text-align:left;"> AMAK </th> <th style="text-align:left;"> ASAP </th> <th style="text-align:left;"> BAM </th> <th style="text-align:left;"> SS </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Age modeled </td> <td style="text-align:left;"> 1+ </td> <td style="text-align:left;"> 1+ </td> <td style="text-align:left;"> 1+ </td> <td style="text-align:left;"> 0+/1+ </td> </tr> <tr> <td style="text-align:left;"> Timing of spawning </td> <td style="text-align:left;"> Real month </td> <td style="text-align:left;"> Fraction </td> <td style="text-align:left;"> Fraction </td> <td style="text-align:left;"> Real month </td> </tr> <tr> <td style="text-align:left;"> Timing of survey </td> <td style="text-align:left;"> Real month </td> <td style="text-align:left;"> Real month </td> <td style="text-align:left;"> Fraction </td> <td style="text-align:left;"> Real month </td> </tr> <tr> <td style="text-align:left;"> Survey index unit </td> <td style="text-align:left;"> Biomass/Number </td> <td style="text-align:left;"> Biomass/Number </td> <td style="text-align:left;"> Biomass/Number </td> <td style="text-align:left;"> Biomass/Number </td> </tr> <tr> <td style="text-align:left;"> Spawner-recruit model </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> ·Standard Beverton-Holt </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> </tr> <tr> <td style="text-align:left;"> ·Ricker </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> </tr> <tr> <td style="text-align:left;"> ·Average recruitment </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> </tr> <tr> <td style="text-align:left;"> Bias adjustment of recruitment </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> </tr> <tr> <td style="text-align:left;"> Types of selectivity available </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> ·Free parameter approach </td> <td style="text-align:left;"> Bound </td> <td style="text-align:left;"> Random walk </td> <td style="text-align:left;"> Logit </td> <td style="text-align:left;"> Random walk/logit </td> </tr> <tr> <td style="text-align:left;"> ·Simple logistic function </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> </tr> <tr> <td style="text-align:left;"> ·Double logistic function </td> <td style="text-align:left;"> Y (3 parameters) </td> <td style="text-align:left;"> Y (4 parameters) </td> <td style="text-align:left;"> Y (4 parameters) </td> <td style="text-align:left;"> Y (4 parameters) </td> </tr> <tr> <td style="text-align:left;"> ·Logistic-exponential function </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> </tr> <tr> <td style="text-align:left;"> ·Joint-logistic function </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> </tr> <tr> <td style="text-align:left;"> ·Double-Gaussian function </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> </tr> </tbody> </table> --- # Age-structured stock assessment package comparison - Feature comparison and common requirements (Part II) <table class="table" style="font-size: 14px; width: auto !important; float: left; margin-right: 10px;"> <thead> <tr> <th style="text-align:left;"> Feature </th> <th style="text-align:left;"> AMAK </th> <th style="text-align:left;"> ASAP </th> <th style="text-align:left;"> BAM </th> <th style="text-align:left;"> SS </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> F in terminal year </td> <td style="text-align:left;"> Last Year </td> <td style="text-align:left;"> Last Year </td> <td style="text-align:left;"> Flexible </td> <td style="text-align:left;"> Last Year </td> </tr> <tr> <td style="text-align:left;"> Definition of F </td> <td style="text-align:left;"> Flexible </td> <td style="text-align:left;"> Apical F </td> <td style="text-align:left;"> Apical F </td> <td style="text-align:left;"> Flexible </td> </tr> <tr> <td style="text-align:left;"> Likelihoods available </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> ·Landings_Lognormal </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> </tr> <tr> <td style="text-align:left;"> ·Survey index_Lognormal </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> </tr> <tr> <td style="text-align:left;"> ·Age composition_Standard multinomial </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> </tr> <tr> <td style="text-align:left;"> ·Age composition_Dirichlet multinomial </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> </tr> <tr> <td style="text-align:left;"> Priors </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> </tr> <tr> <td style="text-align:left;"> ·None </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> </tr> <tr> <td style="text-align:left;"> ·Lognormal </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> </tr> <tr> <td style="text-align:left;"> ·Beta </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> </tr> <tr> <td style="text-align:left;"> ·Normal </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> </td> <td style="text-align:left;"> Y </td> <td style="text-align:left;"> Y </td> </tr> </tbody> </table> *<sup>1</sup>Li et al., In review* .footnote[ [1] B. Li, K. W. Shertzer, P. D. Lynch, J. N. Ianelli, C. M. Legault, E. H. Williams, R. D. Methot Jr, E. N. Brooks, J. J. Deroba, A. M. Berger, S. R. Sagarese, J. K.T. Brodziak, I. G. Taylor, M. A. Karp, C. R. Wetzel, and M. Supernaw. A comparison of four primary age-structured stock assessment models used in the United States. In Review. Fishery Bulletin. ] --- # Age-structured stock assessment package comparison - Beverton-Holt spawner-recruit model bias correction - Difference between geometric- and arithmetic-mean R0 - Difference between geometric- and arithmetic-mean h <img src="static/2020_model_comparison_project/r0_h_bias correction.jpg" width="12597" style="display: block; margin: auto;" /> --- # Age-structured stock assessment package comparison - Conversion <img src="static/2020_model_comparison_project/msy_bias_correctiono.jpg" width="80%" style="display: block; margin: auto;" /> --- # Age-structured stock assessment package comparison .pull-left[ - Stock assessment training - Base case discussion - Cases prioritization - Research phases and monthly reporting - Weekly check-in chat - Meeting notes and actions for next meeting ] .pull-right[  ] .footnote[ http://www.planningplanet.com/sites/default/files/imagecache/wysiwyg_full_page/wysiwyg_imageupload/42/reporting.gif ] --- # Age-structured stock assessment package comparison <img src="static/r4MAS/github_project.PNG" width="2007" style="display: block; margin: auto;" /> --- # Age-structured stock assessment package comparison .pull-left[ - Inputs/outputs standardization - [DMAS](https://github.com/Bai-Li-NOAA/DMAS) Installation instruction ```r install.packages("remotes") remotes::install_github("Bai-Li-NOAA/DMAS") library(DMAS) ``` Load assessment examples ```r asap_input <- DMAS::asap_simple_input asap_output <- DMAS::asap_simple_output ss_input <- DMAS::ss_empirical_waa_input ``` ] .pull-right[  *<sup>1</sup>ICES Transparent Assessment Framework Web App* ] .footnote[ [1] https://taf.ices.dk/app/about ] --- # Age-structured stock assessment package comparison - Technical skills that I learned but I didn't write in the manuscript - Handle assessment model inputs and outputs with available R packages - [ASAPplots](https://github.com/cmlegault/ASAPplots) and ASAP - [FishGraph](https://github.com/RobCheshire-NOAA/FishGraph) and BAM - [r4ss](https://github.com/r4ss/r4ss) and SS - Run assessment models in R ```r asap_input <- ReadASAP3DatFile("asap3.DAT") asap_input$dat$CAA_mats[[1]] <- cbind(em_input$L.age.obs$fleet1, em_input$L.obs$fleet1) WriteASAP3DatFile(fname = "asap3.DAT", dat.object=asap_input) system(paste(file.path(casedir, "ASAP3.exe"), file.path(casedir, "asap3.DAT"))) ``` --- # Fisheries Integrated Modeling System (FIMS) .pull-left[ <img src="static/FIMS/noaa_fisheries_news.jpg" width="90%" style="display: block; margin: auto;" /> ] .pull-right[ - OST + Science Centers - Implementation team - Testing - Parallel work - Test procedures - Test cases - Code review ] .footnote[ [1] https://www.fisheries.noaa.gov/national/population-assessments/fisheries-integrated-modeling-system ] --- # R interface to the Metapopulation Assessment System .pull-left[ - MAS: Metapopulation Assessment System - R interface - Build and run MAS models directly using R - [NOAA Fisheries Integrated Toolbox](https://noaa-fisheries-integrated-toolbox.github.io/) => [Fish and Fisheries](https://nmfs-fish-tools.github.io/) - [Testing: r4MAS GitHub repository](https://github.com/nmfs-fish-tools/r4MAS/tree/master/tests/testthat) - [Documentation: r4MAS site](https://nmfs-fish-tools.github.io/r4MAS/) *<sup>1</sup>Goethel and Berger, 2017* ] .pull-right[  ] .footnote[ [1]https://doi.org/10.1139/cjfas-2016-0290 ] ??? Tool for spatial modeling of fish population --- # R interface to the Metapopulation Assessment System .pull-left[  ] .pull-right[ - Automated testing - Unit testing - Integration testing - System testing - Acceptance testing ] .footnote[ [1]https://miro.medium.com/max/14000/1*_hkNXL7RuIbrwA4VclW0yg.jpeg ] --- # Project workflow and programming resources .pull-left[  ] .pull-right[ - 500+ lines of code in one R script - No version control in code - final.R => final2.R => final_final.R How to make science reproducible and make your life easier? ] .footnote[ [1]http://www.phdcomics.com/comics/archive/phd101212s.gif ] --- # Clear project structure and workflow .pull-left[ ``` ## levelName ## 1 project ## 2 ¦--doc ## 3 ¦ ¦--manuscript.Rmd ## 4 ¦ ¦--notes.Rmd ## 5 ¦ °--requirement.Rmd ## 6 ¦--data-raw ## 7 ¦ °--data.csv ## 8 ¦--data ## 9 ¦ °--data.RData ## 10 ¦--R ## 11 ¦ °--function.R ## 12 ¦--Script ## 13 ¦ °--run_model.R ## 14 °--output ## 15 ¦--summary_statistics.csv ## 16 °--figure.jpeg ``` - Functions VS. Scripts ] .pull-right[  [Practical R Workflow for Scientists Summer 2020](https://rverse-tutorials.github.io/RWorkflow-NWFSC-2020/index.html) ] --- # Version control system "A system that records changes to a file or a set of files over time so that you can recall specific versions later" - Revert selected files back to a previous state - Revert the entire project back to a previous state - Compare changes over time - Git and GitHub .pull-left[ <!-- --> ] .pull-right[ <!-- --> ] .footnote[ Source: https://git-scm.com/; https://github.com/ ] ??? Git is a free and open source distributed version control system GitHub uses Git to provide internet hosting for software development and version control --- # GitHub Actions - Automate tasks within software development life cycle - Event-driven commands run (e.g. push and pull) - Automated multi-platform testing - Job, steps, and actions <img src="static/general/github_actions.PNG" width="2476" style="display: block; margin: auto;" /> --- # In summary - Project oriented workflow - Standardized input/output of your project and tool*<sup>1,2</sup>* - Generate standardized reports with R Markdown*<sup>3</sup>* - Collaborative work with Git and GitHub*<sup>4</sup>* - Produce web-based documentation of your research*<sup>5</sup>* .footnote[ More resources: <br> [1][Taking your data to go with R packages](https://www.davekleinschmidt.com/r-packages/) <br> [2][Making your first R package](https://tinyheero.github.io/jekyll/update/2015/07/26/making-your-first-R-package.html) <br> [3][R Markdown](https://rmarkdown.rstudio.com/) <br> [4][Introduction to Git within RStudio](https://rverse-tutorials.github.io/RWorkflow-NWFSC-2020/intro-git.html)<br> [5][Pkgdown](https://pkgdown.r-lib.org/index.html) ] --- # Thanks! bai.li@noaa.gov