| Peer-Reviewed

Using Full Cohort Information to Improve the Estimation Efficiency of Marginal Hazard Model for Multivariate Failure Times in Case-Cohort Studies

Received: 8 June 2021     Accepted: 2 December 2021     Published: 24 December 2021
Views:       Downloads:
Abstract

The case-cohort design is widely used in large cohort studies when it is prohibitively costly to measure some exposures for all subjects in the full cohort, especially in studies where the disease rate is low. To investigate the effect of a risk factor on different diseases, multiple case-cohort studies using the same subcohort are usually conducted. To compare the effect of a risk factor on different types of diseases, times to different disease events need to be modeled simultaneously. Existing case-cohort estimators for multiple disease outcomes utilize only the relevant covariate information in cases and subcohort controls, though many covariates are measured for everyone in the full cohort. Intuitively, making full use of the relevant covariate information can improve efficiency. To this end, we consider a class of doubly-weighted estimators for both regular and generalized case-cohort studies with multiple disease outcomes. The asymptotic properties of the proposed estimators are derived and our simulation studies show that a gain in efficiency can be achieved with a properly chosen weight function. We apply the proposed method to re-analyze a data set from Atherosclerosis Risk in Communities (ARIC) study to showcase the gain in efficiency. Concluding remarks and future researches are also discussed.

Published in American Journal of Applied Mathematics (Volume 9, Issue 6)
DOI 10.11648/j.ajam.20210906.11
Page(s) 192-210
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2021. Published by Science Publishing Group

Keywords

Case-cohort Study, Multiple Disease Outcomes, Survival Analysis

References
[1] R. L. Prentice, “A case-cohort design for epidemiologic cohort studies and disease prevention trials,” Biometrika, vol. 73, no. 1, pp. 1–11, 1986.
[2] N. Breslow and J. Wellner, “Weighted Likelihood for Semiparametric Models and Two-phase Stratified Samples, with Application to Cox Regression,” Scand. J. Statist., vol. 34, no. 1, pp. 86–102, 2007.
[3] S. G. Self and R. L. Prentice, “Asymptotic Distribution Theory and Efficiency Results for Case-Cohort Studies,” The Annals of Statistics, vol. 16, no. 1, pp. 64–81, 1988.
[4] W. E. Barlow, “Robust variance estimation for the case- cohort design.,” Biometrics, vol. 50, no. 4, pp. 1064–72, 1994.
[5] K. Chen and S.-H. Lo, “Case-cohort and case-control analysis with Cox’s model,” Biometrika, vol. 86, no. 4, pp. 755–764, 1999.
[6] O. Borgan, B. Langholz, S. O. Samuelsen, L. Goldstein, and J. Pogoda, “Exposure stratified case-cohort designs,” Lifetime Data Analysis, vol. 6, no. 1, pp. 39–58, 2000.
[7] S. Kang and J. Cai, “Marginal hazards model for case-cohort studies with multiple disease outcomes,” Biometrika, vol. 96, no. 4, pp. 887–901, 2009.
[8] S. Kim, J. Cai, and W. Lu, “More efficient estimators for case-cohort studies,” Biometrika, vol. 100, no. 3, p. 695, 2013.
[9] J. Ding, T.-S. Lu, J. Cai, and H. Zhou, “Recent progresses in outcome-dependent sampling with failure time data,” Lifetime data analysis, vol. 23, no. 1, pp. 57–82, 2017.
[10] C. M. Ballantyne, R. C. Hoogeveen, H. Bang, J. Coresh, A. R. Folsom, G. Heiss, and A. R. Sharrett, “Lipoprotein- associated phospholipase a2, high-sensitivity c-reactive protein, and risk for incident coronary heart disease in middle-aged men and women in the atherosclerosis risk in communities (aric) study,” Circulation, vol. 109, no. 7, pp. 837–842, 2004.
[11] C. M. Ballantyne, R. C. Hoogeveen, H. Bang, J. Coresh, A. R. Folsom, L. E. Chambless, M. Myerson, K. K. Wu, A. R. Sharrett, and E. Boerwinkle, “Lipoprotein- associated phospholipase A2, high-sensitivity C-reactive protein, and risk for incident ischemic stroke in middle- aged men and women in the Atherosclerosis Risk in Communities (ARIC) study.,” Arch Intern Med, vol. 165, pp. 2479–2484, 2005.
[12] M. Kulich and D. Y. Lin, “Improving the Efficiency of Relative-Risk Estimation in Case-Cohort Studies,” Journal of the American Statistical Association, vol. 99, no. 467, pp. 832–844, 2004.
[13] N. E. Breslow, T. Lumley, C. M. Ballantyne, L. E. Chambless, and M. Kulich, “Improved Horvitz- Thompson Estimation of Model Parameters from Two- phaseStratifiedSamples: ApplicationsinEpidemiology,” Statistics in Biosciences, vol. 1, no. 1, pp. 32–49, 2009.
[14] N. E. Breslow, T. Lumley, C. M. Ballantyne, L. E. Chambless, and M. Kulich, “Using the whole cohort in the analysis of case-cohort data,” American Journal of Epidemiology, vol. 169, no. 11, pp. 1398–1405, 2009.
[15] J. Cai and D. Zeng, “Power calculation for case-cohort studies with nonrare events,” Biometrics, vol. 63, no. 4, pp. 1288–1295, 2007.
[16] J. D. Kalbfleisch and R. L. Prentice, The Statistical Analysis of Failure Time Data. John Wiley & Sons, 2002.
[17] J. Cai and R. L. Prentice, “Estimating equations for hazard ratio parameters based on correlated failure time data,” Biometrika, vol. 82, no. 1, pp. 151–164, 1995.
[18] C. F. Spiekerman and D. Y. Lin, “Marginal Regression Models for Multivariate Failure Time Data,” Journal of the American Statistical Association, vol. 93, no. 443, p. 1164, 1998.
[19] D. Clayton and J. Cuzick, “Multivariate Generalizations of the Proportional Hazards Model,” Journal of the Royal Statistical Society. Series A, vol. 148, no. 2, pp. 82–117, 1985.
[20] W. Hu, J. Cai, and D. Zeng, “Sample size/power calculation for stratified case–cohort design,” Statistics in medicine, vol. 33, no. 23, pp. 3973–3985, 2014.
[21] O. Saarela, S. Kulathinal, E. Arjas, and E. Läärä, “Nested case-control data utilized for multiple outcomes: a likelihood approach and alternatives,” Statistics in medicine, vol. 27, no. 28, pp. 5991–6008, 2008.
[22] N. C. Støer and S. O. Samuelsen, “Comparison of estimators in nested case-control studies with multiple outcomes,” Lifetimedataanalysis, vol.18, no.3, pp.261– 283, 2012.
[23] Y. Yan, H. Zhou, and J. Cai, “Improving efficiency of parameter estimation in case-cohort studies with multivariate failure time data,” Biometrics, vol. 73, no. 3, pp. 1042–1052, 2017.
[24] L. Qi, C. Y. Wang, and R. L. Prentice, “Weighted Estimators for Proportional Hazards Regression With Missing Covariates,” Journal of the American Statistical Association, vol. 100, no. 472, pp. 1250–1263, 2005.
[25] S. Kang, J. Cai, and L. Chambless, “Marginal additive hazards model for case-cohort studies with multiple disease outcomes: an application to the Atherosclerosis Risk in Communities (ARIC) study,” Biostatistics, vol. 14, no. 1, pp. 28–41, 2012.
[26] J. Hajek, “Limiting distributions in simple random sampling from a finite population,” Publications of the Mathematics Institute of the Hungarian Academy of Science, vol. 5, no. 361, p. 74, 1960.
[27] A. W. Van der vaart and J. Wellner, Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics, Springer New York, 2012.
[28] D. Y. Lin, “On fitting Cox’s proportional hazards models to survey data,” Biometrika, vol. 87, no. 1, pp. 37–47, 2000.
[29] R. V. Foutz, “On the Unique Consistent Solution to the Likelihood Equations,” Journal of the American Statistical Association, vol. 72, no. 357, pp. 147–148, 1977.
Cite This Article
  • APA Style

    Hongtao Zhang, Haibo Zhou, David Couper, Jianwen Cai. (2021). Using Full Cohort Information to Improve the Estimation Efficiency of Marginal Hazard Model for Multivariate Failure Times in Case-Cohort Studies. American Journal of Applied Mathematics, 9(6), 192-210. https://doi.org/10.11648/j.ajam.20210906.11

    Copy | Download

    ACS Style

    Hongtao Zhang; Haibo Zhou; David Couper; Jianwen Cai. Using Full Cohort Information to Improve the Estimation Efficiency of Marginal Hazard Model for Multivariate Failure Times in Case-Cohort Studies. Am. J. Appl. Math. 2021, 9(6), 192-210. doi: 10.11648/j.ajam.20210906.11

    Copy | Download

    AMA Style

    Hongtao Zhang, Haibo Zhou, David Couper, Jianwen Cai. Using Full Cohort Information to Improve the Estimation Efficiency of Marginal Hazard Model for Multivariate Failure Times in Case-Cohort Studies. Am J Appl Math. 2021;9(6):192-210. doi: 10.11648/j.ajam.20210906.11

    Copy | Download

  • @article{10.11648/j.ajam.20210906.11,
      author = {Hongtao Zhang and Haibo Zhou and David Couper and Jianwen Cai},
      title = {Using Full Cohort Information to Improve the Estimation Efficiency of Marginal Hazard Model for Multivariate Failure Times in Case-Cohort Studies},
      journal = {American Journal of Applied Mathematics},
      volume = {9},
      number = {6},
      pages = {192-210},
      doi = {10.11648/j.ajam.20210906.11},
      url = {https://doi.org/10.11648/j.ajam.20210906.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajam.20210906.11},
      abstract = {The case-cohort design is widely used in large cohort studies when it is prohibitively costly to measure some exposures for all subjects in the full cohort, especially in studies where the disease rate is low. To investigate the effect of a risk factor on different diseases, multiple case-cohort studies using the same subcohort are usually conducted. To compare the effect of a risk factor on different types of diseases, times to different disease events need to be modeled simultaneously. Existing case-cohort estimators for multiple disease outcomes utilize only the relevant covariate information in cases and subcohort controls, though many covariates are measured for everyone in the full cohort. Intuitively, making full use of the relevant covariate information can improve efficiency. To this end, we consider a class of doubly-weighted estimators for both regular and generalized case-cohort studies with multiple disease outcomes. The asymptotic properties of the proposed estimators are derived and our simulation studies show that a gain in efficiency can be achieved with a properly chosen weight function. We apply the proposed method to re-analyze a data set from Atherosclerosis Risk in Communities (ARIC) study to showcase the gain in efficiency. Concluding remarks and future researches are also discussed.},
     year = {2021}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Using Full Cohort Information to Improve the Estimation Efficiency of Marginal Hazard Model for Multivariate Failure Times in Case-Cohort Studies
    AU  - Hongtao Zhang
    AU  - Haibo Zhou
    AU  - David Couper
    AU  - Jianwen Cai
    Y1  - 2021/12/24
    PY  - 2021
    N1  - https://doi.org/10.11648/j.ajam.20210906.11
    DO  - 10.11648/j.ajam.20210906.11
    T2  - American Journal of Applied Mathematics
    JF  - American Journal of Applied Mathematics
    JO  - American Journal of Applied Mathematics
    SP  - 192
    EP  - 210
    PB  - Science Publishing Group
    SN  - 2330-006X
    UR  - https://doi.org/10.11648/j.ajam.20210906.11
    AB  - The case-cohort design is widely used in large cohort studies when it is prohibitively costly to measure some exposures for all subjects in the full cohort, especially in studies where the disease rate is low. To investigate the effect of a risk factor on different diseases, multiple case-cohort studies using the same subcohort are usually conducted. To compare the effect of a risk factor on different types of diseases, times to different disease events need to be modeled simultaneously. Existing case-cohort estimators for multiple disease outcomes utilize only the relevant covariate information in cases and subcohort controls, though many covariates are measured for everyone in the full cohort. Intuitively, making full use of the relevant covariate information can improve efficiency. To this end, we consider a class of doubly-weighted estimators for both regular and generalized case-cohort studies with multiple disease outcomes. The asymptotic properties of the proposed estimators are derived and our simulation studies show that a gain in efficiency can be achieved with a properly chosen weight function. We apply the proposed method to re-analyze a data set from Atherosclerosis Risk in Communities (ARIC) study to showcase the gain in efficiency. Concluding remarks and future researches are also discussed.
    VL  - 9
    IS  - 6
    ER  - 

    Copy | Download

Author Information
  • Bristol Myers Squibb, Berkeley Heights, New Jersey, USA

  • Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina, USA

  • Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina, USA

  • Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina, USA

  • Sections