Katherine L Thompson, Richard Charnigo and Catherine R Linnen
DOI: 10.4172/2155-6180.1000e126
Over the last decade, improvements in sequencing technologies coupled with active development of association mapping methods have made it possible to link genotypes and quantitative traits in humans. Despite substantial progress in the ability to generate and analyze large data sets, however, genotype-phenotype associations are often difficult to find, even in studies with large numbers of individuals and genetic markers. This is due, in part, to the fact that effects of individual loci can be small and/or dependent on genetic variation at other loci or the environment. Tree-based mapping, which uses the evolutionary relatedness of sampled individuals to gain information during association mapping, has the potential to significantly improve our ability to detect loci impacting human traits. However, current tree-based methods are too computationally intensive and inflexible to be of practical use. Here, we compare tree-based methods with more classical approaches for association mapping and discuss how the limitations of these newer methods might be addressed. Ultimately, these advances have the potential to advance our understanding of the molecular mechanisms underlying complex diseases.
Jacob Pedersen, Jakob Bue Bjorner and Karl Bang Christensen
DOI: 10.4172/2155-6180.1000175
Background: Multi-state analyses are used increasingly in areas such as economic, medical, and social research. They provide a powerful analysis for situations where the research subjects move between several distinct states, but results are often complex. The purpose of the current paper is to present a simple descriptive analysis to visualize patterns in the transitions: the Top10 chart. Data on social transfer payments are used to illustrate the approach. Methods: spent in each state, and is constructed from individual level data. Persons with the same pattern of transitions between states are grouped together and average durations are calculated. We analyzed data from 4950 Danish employees aged 18-59 years who, during two years of follow-up, could at any time be in one of seven mutually exclusive states: work, unemployment, sick-listing, studying, parent leave, disability pension, and an absorbing state consisting of those who died, retried, or emigrated. Results: The 10 most frequent transitional patterns described 84% of all women and 90% of all men in the sample. For women, the typical patterns involved working throughout the study (61.7%), patterns with sick-listing (12.0%), patterns with unemployment (5.3%), patterns with parent leave (3.6%), and studying (1.5%). For men, the typical patterns involved working throughout (68.8%), sick-listing (9.1%), unemployment (4.7%), parent leave (5.2%), and studying (0.9%). Conclusion: The Top10 chart provides a simple descriptive visualization of complex transitional patterns.
Fitrat Hossain and Rezaul Karim
DOI: 10.4172/2155-6180.1000176
Estimation of the total fertility rate (TFR) of Bangladesh for the year 2011 is taken into account in the present study. Determination of total fertility rate is very important for an overpopulated developing country like Bangladesh, as it indicates the average number of children, a woman would bear over her lifetime. We have used Bongaarts model to determine TFR by estimating the four indices associated with this model. For Bangladesh, it has been observed that the marriage and lactational infecundability have the vital role to reduce fertility, which are about 52.9% and 43.5%, respectively, followed by contraception use (41.1%). It has also been noticed that the TFR of Bangladesh for 2011 is very closer to 2.1, which is the replacement level of fertility.
Li Hua Yue, Wenqing He, Duncan Murdoch and Hristo Sendov
DOI: 10.4172/2155-6180.1000177
Variable selection is a difficult problem in building statistical models. Identification of cost efficient diagnostic factors is very important to health researchers, but most variable selection methods do not take into account the cost of collecting data for the predictors. The trade-off between statistical significance and cost of collecting data for a statistical model is our focus. In this paper, we extend the LARS variable selection method to incorporate costs of factors in variable selection, which also works with other methods of variable selection, such as Lasso and adaptive Lasso. A branch and bound search method combined with LARS is employed to select cost-efficient factors. We apply the resulting branching LARS method to a dataset from an Assertive Community Treatment project conducted in Southwestern Ontario to demonstrate the cost-efficient variable selection process, and the results show that a “cheaper” model could be selected by sacrificing a user selected amount of model accuracy.
Esther Herberich and Ludwig A Hothorn
DOI: 10.4172/2155-6180.1000178
Background: Testing the association between a diallelic marker and a censored time-to-event trait is a specific problem in population-based association studies. For a certain gene, the mode of inheritance may be of particular interest. Therefore, the principle of maximum-type tests (or minimum p procedure) is modified for continuous traits, especially for censored time-to-event data.
Results: We propose a Marcus-type multiple contrast test for a single censored time-to-event trait in a populationbased study assuming a Cox proportional hazard model. Using simulations we worked out the limitation of this asymptotic approach: sufficient sample sizes and non-rare alleles are required. A user-friendly implementation of this method is available in the survival and multcomp packages of the statistical software R.
Conclusions: The proposed approach can be used for the analysis of individual SNPs when censored time-to-event data in population-based association studies are of interest. The approach allows both a global claim of association and determination of the particular underlying mode of inheritance. The mode-specific hazard ratios and their lower simultaneous confidence limits provide information about statistical significance and genetic relevance.
Spencer Lourens, Ying Zhang, Jeffrey D Long and Jane S Paulsen
DOI: 10.4172/2155-6180.1000179
Estimating parameters in a mixture of normal distributions dates back to the 19th century when Pearson originally considered data of crabs from the Bay of Naples. Since then, many real world applications of mixtures have led to various proposed methods for studying similar problems. Among them, maximum likelihood estimation (MLE) and the continuous empirical characteristic function (CECF) methods have drawn the most attention. However, the performance of these competing estimation methods has not been thoroughly studied in the literature and conclusions have not been consistent in published research. In this article, we review this classical problem with a focus on estimation bias. An extensive simulation study is conducted to compare the estimation bias between the MLE and CECF methods over a wide range of disparity values. We use the overlapping coefficient (OVL) to measure the amount of disparity, and provide a practical guideline for estimation quality in mixtures of normal distributions. Application to an ongoing multi-site Huntington disease study is illustrated for ascertaining cognitive biomarkers of disease progression.
Stephan Weinwurm, Johann Sölkner and Patrik Waldmann
DOI: 10.4172/2155-6180.1000180
The goal of genome-wide association studies (GWAS) is to identify the best subset of single-nucleotide polymorphisms (SNPs) that strongly influence a certain trait. State of the art GWAS comprise several thousand or even millions of SNPs, scored on a substantially lower number of individuals. Hence, the number of variables greatly exceeds the number of observations, which also is known as the p≫n problem.
This problem has been tackled by using Bayesian variable selection methods, for example stochastic search variable selection (SSVS) and Bayesian penalized regression methods (Bayesian lasso; BLA and Bayesian ridge regression; BRR). Even though the above mentioned approaches are capable of dealing with situations where p≫n, it is also known that these methods experience problems when the predictor variables are correlated. The potential problem that linkage disequilibrium (LD) between SNPs can introduce is often ignored.
The main contribution of this study is to assess the performance of SSVS, BLA, BRR and a recently introduced method denoted hybrid correlation based search (hCBS) with respect to their ability to identify quantitative trait loci, where SNPs are partially highly correlated. Furthermore, each method’s capability to predict phenotypes based on the selected SNPs and their computational demands are studied. Comparison is based upon three simulated datasets where the simulated phenotypes are assumed to be normally distributed.
Results indicate that all methods perform reasonably well with respect to true positive detections but often detect too many false positives on all datasets. As the heritability decreases, the Bayesian penalized regression methods are no longer able to detect any predictors because of shrinkage. Overall, BLA slightly outperformed the other methods and provided superior results in terms of highest true positive/ false positive ratio, but SSVS achieved the best properties on the real LD data.
Kunihiko Takahashi, Hiroyuki Nakao and Satoshi Hattori
DOI: 10.4172/2155-6180.1000181
In epidemiological studies that measure the risk at different levels of exposure, the data is often only available for the analyses that summarized the response data in grouped exposure intervals. In typical methods, the midpoints are used as the assigned exposure levels for each interval. Results of the analysis with grouped data may be sensitive to the assignment of the exposure levels. In this paper, we propose a procedure for assessing J-shaped associations based on the likelihood-based assignment of values to grouped intervals of exposure, and applying the cubic spline regression models. Numerical illustrations and comparisons based on simulations showed that the proposed procedure can yield better estimates for curves than those obtained using the typical assignment method based on the midpoints of each interval.
Simon Thornley, Roger J Marshall, Susan Wells and Rod Jackson
DOI: 10.4172/2155-6180.1000182
By testing for conditional dependence, algorithms can generate directed acyclic graphs (DAGs), which may help inform variable selection when building models for statistical risk prediction or for assessing causal influence. Here, we demonstrate how the method may help us understand the relationship between variables commonly used to predict cardiovascular disease (CVD) risk.
The sample included people who were aged 30 to 80 years old, free of CVD, who had a CVD risk assessment in primary care and had at least 2 years of follow-up. The endpoints were combined CVD events, and the other variables were age, sex, diabetes, smoking, ethnic group, preventive drug use (statins or antihypertensive), blood pressure, family history and cholesterol ratio. We used the ‘grow shrink’ algorithm, in the bnlearn library of R software to generate a DAG.
A total of 6256 individuals were included, and 101 CVD events occurred during follow-up. The accepted causal associations between tobacco smoking and age and CVD were identified in the DAG. Ethnic group also influenced risk of CVD events, but it did so indirectly mediated through the effect of smoking. Drug treatment at baseline was influenced by a wide range of other variables, such as family history of CVD, age and diabetes status, but drug treatment did not have a ‘causal’ association with CVD events. Algorithms which generate DAGs are a useful adjunct to traditional statistical methods when deciding on the structure of a regression model to test causal hypotheses.
Daniel Cadena Sandoval and Carlos Alfonso Tovilla Zarate
The meta-analysis is a tool used to know results of previous research and allows obtaining a result globally. It is known that increase the statistics power and have an estimate of the treatment effect. It allows us to combine the results of studies with different results. Therefore in this paper we present some generalities about the development and the main points to consider when performing a meta-analysis. Meta-analysis is used in different fields of knowledge including genome wide genetic associations, so it is necessary for evidence-based medicine.
Masahiro Sugihara, Satoshi Morita, Naoki Yamanouchi, Shinya Sakai, Noriaki Ohba, Wataru Ichikawa and Yasuo Ohashi
Randomized controlled trials are the most scientifically informative studies for evaluating treatment effects. However, we need to conduct observational studies to evaluate unallocatable factors such as genotype, preference, or lifestyle. In observational studies, subject characteristics among the comparison groups might be imbalanced due to non-random allocation. We proposed a dynamic registration method to improve comparability among comparison groups with no allocation. The dynamic registration method is a registration method based on the minimization method, which decides whether or not to register a subject based on the background information of subjects already recruited and the new subject. Simulation studies were conducted to examine the performance of this method in improving comparability among comparison groups. Simulation studies showed that the dynamic registration method improves the comparability among comparison groups. The dynamic registration method can be used to enhance the quality of observational studies for unallocatable factors.
Jessica M. Ketchum, Al M. Best and Viswanathan Ramakrishnan
Data on Heart Rate Variability (HRV) have been used extensively to indirectly assess the autonomic control of the heart. The distributions of HRV measures, such as the RR-interval, are not necessarily normally distributed and current methodology does not typically incorporate this characteristic. In this article, a mixed-effects modeling approach under the assumption of a two-component normal-mixture distribution for the within-subject observations has been proposed. Estimation of the parameters of the model was performed through an application of the EM algorithm, which is different from the traditional EM application for the normal-mixture methods. An application of this method was illustrated and the results from a simulation study were discussed. Differences among other methods were also reviewed.
Jing Yang, Xiaofang Ling, Tian Xie, Lishan Yao and Visakan Kadirkamanathan
Thermoanaerobacter sp. X514, capable of fermenting most hexose and pentose to produce ethanol, is of critical importance in industrial bioethanol production. This paper provides a detailed investigation of the intracellular metabolic flux flowing activities of X514 under different conditions, including sole glucose substrate condition, sole xylose substrate condition and mixed glucose and xylose substrates condition by means of Gibbs sampling algorithm and stoichiometric metabolic flux balances. Statistical analysis of the results show that all the flux distributions exhibit Gaussian or truncated Gaussian distributions under the assumption that noise of the system is Gaussian distribution. Pentose phosphate pathway becomes more active when xylose is the sole substrate whereas glycolysis pool is more active under sole glucose substrate condition. Major changes of the fluxes activities occur in the connection flux between glycolysis pool and pentose phosphate metabolite pool, indicating the balances between the need for NADPH, the requirement of ATP as well as the necessity of pyruvate under different substrate conditions. Generation of pyruvate reaches its maximum when sole glucose is fed into the system, indicating the preference of the system on glucose consumption. The whole metabolic flux distributions illustrate vivid information about the central metabolic map of X514.
Susan Halabi
In many clinical trials, a single endpoint is used to answer the primary question and forms the basis for monitoring the experimental therapy. Many trials are lengthy in duration and investigators are interested in using an intermediate endpoint for an accelerated approval, but will rely on the primary endpoint (such as, overall survival) for the full approval of the drug by the Food and Drug Adminstration. We have designed a clinical trial where both intermediate (progression free survival, (PFS)) and primary endpoints (overall survival, (OS)) are used for monitoring the trial so the overall type I error rate is preserved at the pre-specified alpha level of 0.05. A two-stage procedure is used. In the first stage, the
Bonferroni correction was used where the global type I error rate was allocated to each of the endpoints. In the next stage, the O'Brien-Fleming approach was used to design the boundary for the interim and final analysis for each endpoint. Data were generated assuming several parametric copulas with exponential marginals. Different degrees of dependence, as measured by Kendall's τ , between OS and PFS were assumed: 0 (independence), 0.1, 0.3, 0.5 and 0.7. The results of the simulations were robust regardless of the copula that were assumed. We controlled for the global type I error and marginal type I rates for both of the endpoints under the null hypothesis. In addition, the global power and individual power for each endpoint were attained at the desired level under the alternative hypotheses. This approach is applied to an example in a prostate cancer trial.
Tracy L Bergemann, Paul Bangirana, Michael J Boivin, John E Connett, Bruno J Giordani and Chandy C John
Introduction: Assessment of the effects of disease on neurocognitive outcomes in children over time presents several challenges. These challenges are particularly pronounced when conducting studies in low-income countries, where standardization and validation is required for tests developed originally in high-income countries. We present a statistical methodology to assess multiple neurocognitive outcomes over time. We address the standardization and adjustment for age in neurocognitive testing, present a statistical methodology for development of a global neurocognitive score, and assess changes in individual and global neurocognitive scores over time in a cohort of children with cerebral
malaria. Methods: Ugandan children with cerebral malaria (CM, N = 44), uncomplicated malaria (UM, N = 54) and community controls (N = 89) were assessed by cognitive tests of working memory, executive attention and tactile learning at 0, 3, 6 and 24 months after recruitment. Tests were previously developed and validated for the local area. Test scores were adjusted for age, and a global score was developed based on the controls that combined the assessments of impairment in each neurocognitive domain. Global normalized Z-scores were computed for each of the three study groups. Model-based tests compare the Z-scores between groups. Results: We found that continuous Z-scores gave more powerful conclusions than previous analyses of the dataset. For example, at all four time points, children with CM had significantly lower global Z-scores than controls and children with UM. Our methods also provide more detailed descriptions of longitudinal trends. For example, the Z-scores of children with CM improved from initial testing to 3 months, but remained at approximately the same level below those of controls or children with UM from 3 to 24 months. Our methods for combining scores are more powerful than tests of individual cognitive domains, as testing of the individual domains revealed differences at only some but not all time points.
Shunzo Maetani and John W Gamel
As many cancer patients have recently been cured, it has become necessary in cancer survival analysis to distinguish between cure and delayed death, which make a great difference in survival benefit and quality of life. Also, cancer patients must be provided with relevant and comprehensible information to make optimal decisions. For this purpose, the Boag parametric analysis with a cured fraction has emerged as a relevant model. The authors evaluated this model compared with the Cox model using life-long follow-up data. The parameters of the Boag model provided the comprehensible information patients wish to obtain; particularly, the cure rate served as a useful measure of survival benefit. In contrast, the hazard ratio, a parameter of the Cox model, failed to distinguish cure from delayed death. The Boag model could be extended to regression analysis to evaluate the long-term effects of various factors, including cancer treatment. Also, it could be extended to predict the overall survival
curve and mean survival time using limited follow-up data. In conclusion, the Boag model offered a more relevant measure of the long-term benefit of cancer treatment and other factors than conventional methods, although an ideal model has yet to be developed
Background: Few case-control studies of time dependent environmental exposures and respiratory outcomes have been performed. Small sample sizes pose modeling challenges for estimating interactions. In contrast, case cross-over studies are well suited where control selection and responses are low, time consuming and costly.
Objective: To demonstrate feasibility in daily recruitment of children admitted to hospital with asthma and validity of the case crossover methodology for hospital based studies. Methods: The Melbourne Air Pollen Children and Adolescent Health (MAPCAH) study recruited incident asthma admissions of children and adolescents aged 2–17 years to a tertiary hospital. A case was defined by date of admission, and eligible cases served as their own controls. We used bi-directional sampling design for control selection. At time of admission, participants underwent skin prick tests and nasal/throat swabs (NTS) to test for respiratory viruses.Questionnaires collected data on asthma management, family history and environmental characteristics. Daily concentrations of ambient pollen, air pollution and weather variables were also available. Results: 644 children were recruited. More than half (63%) were male with mean age 5.2(SD 3.3) years. Nonparticipants were slightly younger at admission (mean age 4.4, SD 2.8, p<0.001), although the absolute differences were small. Participants and non-participants were well balanced on gender. The most common reason for refusal to participate in the study was “causing further distress to child by skin prick testing”. Gender and age distributions were similar to the overall admissions to the tertiary hospital as well as in Victoria. Our study slightly under-represented winter admissions (p<0.001), and was over-represented in spring (p<0.001). More admissions occurred during the grass pollen season in our study than in general asthma hospital admissions across Victoria (42% versus 22%, p<0.001). Conclusions: The case cross-over method is a highly feasible design for a reasonably sized hospital based study of children with asthma. MAPCAH has robust internal validity and strong generalizability. Collection of data on respiratory viruses and pollen exposure at the time of admission on children with asthma provides important information that will have clinical and public health impacts.
Yuejen Zhao and Andy H. Lee
Accurate assessment of the association between exposure and response is central to identifying causality in medical research. The concentration index has been commonly used to study income inequality and socioeconomic related health inequality. This study generalizes applications of the concentration index to measure the relative and attributable risks for describing exposure-response relationships in medical research. Based on cumulative distribution functions, a new measure of correlation is proposed to quantify the association between exposure and response. The
connection between the new and existing measures is discussed. The method enables the semi-parametric analysis of overall association and disparity by risk factors. Both grouped and continuous data situations are considered with two applications. The first example illustrates the relationships between the concentration index, relative and attributable risks. The second example demonstrates how the concentration index can assist in evaluating the association between the radiation dose and the incidence of leukaemia. Logistic regression based decomposition is compared with the new
approach. We found the concentration index analysis useful not only for examining socioeconomic determinants of health, but also for assessing quantitative relations between exposures to health risks and ill-health outcomes.
Oyeka ICA
This paper proposes and presents a test statistic that intrinsically and structurally adjusts the usual McNemar test statistic for the possible presence of tied responses between the paired populations of cases and control subjects that may be measurements on any scale. The method also enables the researcher readily estimate not only the chances that among a random selected pair of case and control subjects the case responds positive and the control responds negative, or the case responds negative and the control responds positive, but also even when both case and control subjects have similar responses, it enables one easily estimate the probability that both respond positive or both respond negative. The proposed method, which is shown to be relatively more efficient and hence likely to be more powerful than the usual McNemar test statistic is illustrated with some data.
Oyeka ICA and Okeh UM
This paper proposes and presents a non-parametric statistical method for the analysis of non-homogeneous two sample data, in which the sampled populations may be measurements on as low as the ordinal scale. The test statistic intrinsically and structurally provides the possible presence of tied observations between the sampled populations, thereby obviating the need to require these populations to be continuous or even numeric. The proposed method can easily be modified for use with data that are not necessarily non-homogeneous. The method is illustrated with some data and shown to compare favorably with some existing methods.
John J Rogus, Shu-Fang Lin and Eduardo K Lacson Jr
Introduction: Methods such as generalized least squares regression and linear mixed models have traditionally been used for analyzing repeated measurement data. However, the computational burden for these procedures can be prohibitively high for large data sets. We propose an efficient, non-parametric method for the analysis of a continuous outcome variable with intrapatient correlation and a dichotomous predictor variable.
Methods: The patient-level values of the dichotomous variable of interest are randomized to generate sets of equally likely permutations of the data under the null hypothesis. For each replication, the test statistic for the dichotomous variable is calculated and the collection of all such test statistics forms an empirical reference distribution used to assign a p-value to the actual test statistic from the original data. Efficient calculation of the reference distribution is possible by operating on the level of sufficient statistics for the outcome variable, as the dichotomous nature of the predictor variable then allows for rapid recalculation of the tests statistic at each replicate. An example based on 629,452 measurements of systolic blood pressure in 39,313 dialysis patients is used for illustration.
Results: The Monte Carlo p-value for a decrease in systolic blood pressure following a decrease in dialysate sodium was 0.04. Other computationally feasible, but inefficient, approaches such as data aggregation and year-overyear comparisons were unable to find a significant association.
Discussion: Monte Carlo simulation offers a valid approach to analyze a continuous outcome variable with intrapatient correlation and a dichotomous predictor of interest. This method can accommodate other predictors through a two-step procedure involving an initial regression analysis. Future work is needed to characterize the power of this approach relative to other methods and to study whether weighting strategies may be helpful in the situation where not all patients contribute the same number of data points to the analysis.
Journal of Biometrics & Biostatistics received 3496 citations as per Google Scholar report