Basic Assumptions and Equations

(a.) Mortality Data Base: historical age-specific cancer rates.

U.S. cancer mortality numbers and populations recorded 1900-2006 by the U.S. Census Bureau (1900-1935) and the U.S. Public Health Service (1936-2006) have been matched and organized with regard to gender, ethnicity, calendar interval of birth, “h”, (ten years: 1800-09, 1810-19, …), calendar year interval of death, “y” (five years, 1900-04, 1905-09,…), and age at death interval, “t” (five years, 0-4, ….100-104)) (Supporting Information Table S1). These data allow computation of raw age-specific lifetime mortality rates, OBS(h,t), as the number of deaths by the observed cause divided by the number of persons alive at the beginning of the one-year interval “t”. Thus OBS(h,t) is an approximation of the conditional probability that a person would have died of the observed cause given that he or she was still alive. However, cancer models predict incidence rates, INC(h,t), as a calculated approximation, CAL(h,t), of conditional rates of deaths absent covariant factors such as competing forms of death or the effect of medical intervention in the age/time interval observed. Transforming observed raw mortality rates, OBS(h,t), to estimates of incidence rates, INC(h,t), requires correction for several sources of bias. In extreme old age (t = 100-104) death rates approach ~0.3 per year and must have reduced the number of deaths by the observed cause. Correction for this bias consists of determining the total raw mortality rate for each five year age interval, TOT(h, t), and defining the coincidence-corrected mortality rate at the third or middle year, OBS*(h,t) as OBS(h,t) /[1-TOT(h,t) + OBS(h,t)]. Accounting for historically improving five-year survival rates, SUR(h,t), is also required for some cancers such as colorectal cancers. The expected incidence rate, INC(h, t), adjusted for these considerations is:

INC(h,t ) = OBS(h,t)/( [1-SUR(h,t)][1-TOT(h,t) + OBS(h,t)] ). Equation 1

Diagnostic errors at death may also be expected and these would vary among cancer types, age at death, historical year of reporting etc. so that INC(h,t) as defined here is an approximation and its uncertainties must be considered in comparing predictions of models, CAL(h,t), to incidence represented by INC(h,t).

(b.) Algebraic elements of the two-stage model.

Limitation of initiation mutations to the fetal/juvenile stem cell doublings.

Growth of normal fetal/juvenile stem cells is here modeled as a series of “a” net binomial doublings (a = 0, 1, 2, …, a_max) in which “n” required initiation mutations, i, j, …n, occur in any order at constant mutation rates R_i, R_j, …,R_n per doubling. The number of newly initiated stem cells in doubling period “a” is (Π_n R_i) a^(n-1) 2^a. In the fetal/juvenile model organogenic stem cells are posited to reach maturity represented by “a_max”doublings with high constant mutation rates and to undergo metamorphosis to maintenance stem cells with no net additional net cell growth and much lower mutation rates.

Assuming each of the ~10⁷ adult colonic crypts to be represented at juvenile/adult metamorphosis by a single metakaryotic stem cell, the number of net doublings at maturity, a_max,is about 23.25, i.e., 10⁷~ 2^23.25. The metakaryotic mutator/hypermutable stem cell lineage of human organ anlagen appears to begin in gestational week 4-5 with creation of two metakaryotic stem cells from symmetrical amitosis of a single precursor embryonic mitotic stem cell at a = 0. At birth, a colon contains ~2²⁰ colonic crypts each containing a basal metakaryotic stem cell; thus at birth, a ~ 20, at maturity, a = a_max ~ 23.25.

Promotion mutations during preneoplastic stem cell doublings.

After initiation in any fetal/juvenile doubling “a” growth of preneoplastic stem cells as a colony is modeled as a series of “g - a” binomial doublings (g-a = 0, 1, 2, …) in which “m” required promotion mutations (A, B, …m, occur at constant mutation rates R_A, R_B, …,R_m per doubling). The expected number of newly initiated stem cells in preneoplastic doubling period “g-a” is (Π_m R_A) (g-a)^(m-1) 2^(g-a). Under these assumptions the number of organogenic doublings “a” at initiation and the number of preneoplastic doublings “g-a” after initiation sum to “g” which is a very useful continuous variable because it describes the age of humans in terms of continuous stem cell doublings through fetal/juvenile and then preneoplastic growth. In each organogenic doubling interval “a” new preneoplastic colonies are created (initiated) and these colonies grow until promotion and subsequent death remove them. The extinction of preneoplastic colonies at “a” and at “g - a” is driven by the supra-exponential term exp[-m ( R_A (g-a)^(m-1) 2^(g-a)]).

If all persons have the same numbers and rates of “n” required initiation and “m” required promotion oncomutations and all initiated cells grow at the same average rate as preneoplastic stem cells (homogeneous risk) without any synchronously competitive forms of mortality the expected number of promotional events at the binomial doubling age interval “g”, V(g) may be represented as:

V(g) = n_n R_i Σ(0,a_max) a^(n-1) 2^a d(1-exp[-m R_A(g-a)^(m-1) 2^(g-a)]) /d(g-a) Equation 2

This process is illustrated in Figure 2 in which the contribution to promotion at age “g” from initiation at each organogenic doubling “a” is shown to rise and fall with “g-a”. The sum of these terms from initiations in all organogenic doubling intervals “a” approximates well the observed lifetime incidence rate of many cancer types including colorectal cancer: it increases sub-exponentially, reaches a maximum in old age and declines appreciably in extreme old age. The earliest initiations of fetal organogenesis drive the tumor incidence rate of juveniles and young adults, the initiations of adolescent organogenesis drive the tumor incidence rate in extreme old age.

Under these conditions the expected number of newly promoted lesions through the end of any doubling interval “g”, CAL(g), is:

CAL(g) = (1-e^-V(g)) Equation 3

Age of death, “t”, and doubling age of promotion, “g”.

Cancer mortality data corrected for coincident deaths within the year of death, OBS*(h,t) and its derived estimate of incidence, INC(h,t) are calculated in five year age-of-death intervals 5-9 ,…., 100-104 years such that deaths in any age interval are plotted at the mid interval. CAL(h, g) is, however, approximated as the instantaneous rate of promotion at the end of each stem cell doubling interval “g”.

To account for the difference between age at promotion and death we adopt Armitage and Doll’s estimate of 2.5 yr. Death at age t = 72.5 is thus attributed to promotion at age t = 70.

The relationship between human age at death in years, “t”, and stem cell doubling age at promotion, “g” is then defined if there is a constant average preneoplastic stem cell annual doubling rate, “m”. Given the age of maturity for males as 16.5 yr at g = a_max:

g = μ (t -16.5 – 2.5 ) + a_max=μ (t -19 ) + a_maxEquation 5

Stratification of risks in the population.

We represent the fraction of the population in whom all of the potential conditions necessary for cancer death are present as “F”. The corresponding fraction in which any necessary condition is absent is represented as (1-F). Stratification need not, however, be an “all or none” phenomenon. Stratification with regard to mutation rates in fetal/juvenile expansion has been noted for both mitochondrial and nuclear genes. The use of “F” in this present report serves as first approximation in stratification from any underlying cause. Equation 4 rewritten to account for stratification in this way creates the model:

CAL(g) = F(1-e^-V(g))/ [F + (1-F) e ^∫^{V(g) dg}] evaluated from g = 0 to g. Equation 6.

Competing synchronous forms of mortality.

Epidemiological observations have also demonstrated that forms of cancer may share environmental or inherited risk factors with another, e.g. breast and ovarian cancers, in which the death rates increase synchronously with age. The term ”f” has been introduced to represent the fraction of persons that die of the observed cause among the set of mortal diseases with shared risks and synchronous changes in death rates. Equation 6 rewritten to account for both stratification and a hypothetical synchronous competing form of mortality with shared risk factors with the observed disease in this way creates the model:

CAL(g) = F(1-e^-V(g))/ [F + (1-F) e ^{∫ 1/f V(g) dg}] evaluated from g = 0 to g. Equation 7.