MIT Database Administrator: Lohith Kini (Email: )This MIT web page has been assembled and is maintained by present and former students, researchers and faculty of the Department of Biological Engineering to provide an interface between the population experience of common mortal diseases in the United States and Japan and quantitative cascade models based on biological and clinical information about these diseases. (See list of contributors below.) It contains two major elements:
One may select either “View U.S. Mortality Data” or “View Japanese Mortality Data” by clicking on the appropriate icon.
Clicking on "View U.S. Mortality Data" opens a list of forms of mortality as recorded in Vital Statistics of the United States beginning with "All Causes" and ending with "Senility". This site contains most but not all data regarding more common forms of cancer and other forms of mortality. Researchers interesting of organizing data for any unlisted disease(s) should contact Prof. W.G.Thilly, email@example.com, for advice and/or assistance. We would be pleased to include links to historical databases for other countries.
Clicking on a particular group of diseases such as “Digestive organs and Peritoneum (150-169)” under Malignant Neoplasms displays a more specific list of cancers.
Numbers shown in brackets are the International Statistical Classification of Diseases and Related Health Problems (ICD-9) Codes used to categorize diagnoses recorded as the cause of death. In some cases we have combined data from several cancer sites in order to obtain a more complete historical record. For example, Colon Cancer (153) was recorded only since 1958 but if combined with Anal Cancer (154) and Small Intestine (152), they yield a historical intersection with Lower Gastro-intestinal Tract Cancer with records continuous from 1900-2006. Occasionally, the printed record contained obvious typographic errors or nonsensical data. In these cases interpolations were used to fill in missing values and such interpolations are clearly printed in red in the primary record of number of deaths on the Excel file sheets "Raw Data".
Clicking next on a specific cancer site such as “Lower GI Tract” opens a page of summary data recorded from 1900-2010 organized by gender and ethnic groups (EA, European-Americans and NEA, Non-European Americans, predominantly African-Americans) and secondarily with regard to (a.) age of death (displayed chart) (b.) calendar year of birth and (c.) calendar year of death. Charts for (b.) and (c.) are opened by clicking the desired gender and ethnic group for each category.
Shown are summary charts in which the log10 age-specific mortality rates (annual deaths/100,000 population) on the y-axis are shown as a function of age of death on the x-axis. Each birth decade cohort's age-specific mortality rate is depicted by joined symbols so that the form and historical changes in age-specific lifetime mortality rates for this form of death may be observed in a single chart.
Alternately, one may choose to observe the mortality rates of individual birth decade cohorts displayed over calendar years or as specific age-specific death rates, e.g. 50—54 yrs, displayed over the entire period of recording.
Finally, the complete record for any disease may be downloaded to inspect the raw annual data as recorded by the U.S. Census Bureau or U.S. Public Health Service along with several additional ways to view the data.
If desired all data on this website may be downloaded by clicking the icon Download all Mortality Data that comprises ~66 Mb as Excel(TM) files.
Clicking “CancerFit” below opens a page containing four links.
The first link, when clicked, shows the basic assumptions and equations used in a cascade model including but not limited to the assumptions of CancerFit v.5.0.
THIS LINK MUST BE STUDIED AND UNDERSTOOD BEFORE ANY FURTHER STEPS COULD BE USEFUL.
INC(h,t) is the set of age-specific mortality rates for death year intervals (t = 15-19, 20-24,…,100-104) of a particular population cohort defined by gender, ethnic group, and birth decade, h, e.g. EAM, 1890-99 (European-American males born 1890-99) corrected for (a.) coincident forms of death within each year and (b.) survival due to medical intervention.
CAL(h,t) is the set of age-specific incidence rates predicted by the model as the "best-fit" to the data supplied as INC(h,t) of the model to INC(h,t). In the calculation of CAL(h,t), wide ranges of values for initiation, Ri,j,…,n and promotion RA,B,…,m event rates, preneoplastic colony growth rates, the fraction, "F", of persons at risk of the particular disease for any combination of required inherited or environmental risks and a function, "f" that represents the fraction of a group with synchronously mortal form(s) of disease with shared risks with the disease studied accounted by deaths by that disease.
These data are compared by CancerFit v.5.0 to a cascade model that assumes (a.) 'n' initiation mutations are required in an organogenic stem cell during the fetal juvenile period to create a first preneoplastic stem cell and (b.) 'm' promotion mutations are required in an initiated preneoplastic stem cell to create a first neoplastic (tumor) stem cell. Goodness of fit (GOF(h,t)) is goodness of fit of the function generated from comparison of INC(h,t) to CAL(h,t). GOF(h,t), is calculated as the sum of [log(INC(h,t))-log(CAL(h,t))] 2 divided by the number of age-of-death intervals employed in the comparison.
The second link, when clicked, will download the entire source code of CancerFit, written for MATLAB v7.6 or higher. The download file (CancerFit v5.0, approximately 66 MB) is a zipped filed containing MATLAB source code along with all the mortality data from this M.I.T. repository. An interested user who downloads the zip file has to first unzip the file, titled CancerFitv5_0.zip. If you are using a Mac OS X, the zip file will show up in your Downloads list and will be automatically unzipped and available in the location where your downloaded items are sent. The unzipped folder will reveal a list of folders: “Mortality Files”, “src”, “util” along with the following files: “CancerFit.fig” and “CancerFit.m”. The model equations are implemented in the files listed under the “src” folder and the interface itself is programmed in the files labeled “CancerFit.fig” and “CancerFit.m”. The folder “Mortality Files” consists of all the mortality and population data of all ~111 diseases available on this website as Excel(TM) and text files, both of which can be directly accessed for analysis by the CancerFit program.
The third link is a tutorial describing the steps a CancerFit user needs to take in order to analyze a particular age-specific lifetime mortality function here using cancer of the lower GI tract in European American Males born 1890-1899 as an example.
The fourth and final link opens a page containing example results obtained on the Cancer of the Lower GI Tract, EAM, birth interval 1890-99 using estimated post-diagnosis five-year survival rates (See Herrero-Jimenez et al., 1998, 2000) to define INC(h,t). The program CancerFit v.5.0 was run iteratively for all twenty-five pairs of different numbers of initiation events (n = 1,2,3,4,5) and promotion events (m = 1,2,3,4,5).
First, the best fits of CAL(h=1890-99, 15< t <104) were calculated for the twenty-five combinations of n = 1-5 and m = 1-5 under the parsimonious conditions of homogeneous risk, F=1, and no synchronous mortal diseases sharing risk factors with colorectal cancer, f = 1. Values of (Pii Ri)1/n and (PiA RA)1/m were permitted to range from 10-9 to 100 and the range of mu was set at 0.1 to 0.3.
Second, the best fits of CAL(h,t) to INC(h,t) were assessed under the additional assumption of inhomogeneous risk, i.e., the parameter “F” representing a hypothetical fraction of the population at risk was allowed to range from 0 to 1.
Thirdly, we considered the possibility of both population inhomogeneity, F < 1, and a competing synchronous mortal disease having genetic and/or environmental risks shared with colorectal cancer, i.e., the parameter “f” representing this possibility was allowed to range from 0 to 1. This assumption did not, however, further reduce the values of GOF(h,t).
A figure at the bottom of these sample results depicts the degree of concordance of the two trial conditions given n=2 and m=1: F = 1, f = 1 (population homogeneity, no synchronous competing risk) and F < 1, f =1 (population inhomogeneity, no synchronous competing risk)) with adult lifetime incidence data for lower G.I. tract cancer in European American males born 1890-99 INC(h,t).
The "cohort allelic sums test" or "CAST" is provided as the following excel program, CASTAT(c). This test is described in Thilly & Morgenthaler (2007) Mutation Research paper, "A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST)."Click here: CASTAT (c)