Mens X Machina
Our softwareRecent News
-
The Excellence Award “Professor Zoe Dimitriadis Prize” to the member of our laboratory, Konstantina Biza
25 Jun , 2024The University of Crete announces the awarding of the “Professor Zoe Dimitriadis Prize for Excellence” for the academic year 2022-2023. The award is granted from the proceeds of the donation of Professor Zoe Dimitriadis to postgraduate students, doctoral candidates and/or postdocs of the University of Crete for publication during the previous academic year of the
-
Prof. Tsamardinos was awarded with the European Research Council, Proof of Concept grant award in 2021
22 Jun , 2022The European Research Council, Proof of Concept was awarded to Professor Tsamardinos in 2021
-
Best paper award from the 37th BDA 2021 conference in the National French conference on Data Management For the article: “On Predictive Explanation of Data Anomalies”, Myrtakis, N., Tsamardinos, I., & Christophides
22 Jun , 2022Best paper award from the 37th BDA 2021 conference in the National French conference on Data Management For the article: “On Predictive Explanation of Data Anomalies”. The authors of the article are Myrtakis Nikos, Tsamardinos Ioannis and Christophides Vassilis.
-
Open positions available – CALLS FOR EXPRESSION OF INTEREST
18 Jul , 2019Several contractual positions are open at the MensXMachina research group. Our group is highly multi-disciplinary and multi-national focusing on data analysis and machine learning with an emphasis on biomedical data. Our scientific goals mainly revolve around: (a) the basic research on causal discovery, causal analysis, and causal modeling, (b) the design of automated machine learning
Our Team
John Fotopoulos
Technical Staff
Department of Computer Science, University of Crete, Hellas
Publications
2023
- A. Ntroumpogiannis, M. Giannoulis, N. Myrtakis, V. Christophides, E. Simon, and I. Tsamardinos, A Meta-level Analysis of Online Anomaly DetectorsThe VLDB Journal, 2023. doi:10.1007/s00778-022-00773-x
[BibTeX] [Download PDF]@misc{https://doi.org/10.1007/s00778-022-00773-x, added-at = {2023-03-07T22:49:53.000+0100}, author = {Ntroumpogiannis, Antonios and Giannoulis, Michail and Myrtakis, Nikolaos and Christophides, Vassilis and Simon, Eric and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/2c6cd4b4041e3e546204b7a86899b350a/mensxmachina}, copyright = {Creative Commons Attribution 4.0 International}, doi = {10.1007/s00778-022-00773-x}, interhash = {b686003d5f8fd9819551157d5e3123b2}, intrahash = {c6cd4b4041e3e546204b7a86899b350a}, keywords = {anomalies learning machine}, publisher = {The VLDB Journal}, timestamp = {2023-03-07T22:49:53.000+0100}, title = {A Meta-level Analysis of Online Anomaly Detectors}, url = {https://link.springer.com/article/10.1007/s00778-022-00773-x}, year = 2023 }
2022
- S. Bowler, G. Papoutsoglou, A. Karanikas, I. Tsamardinos, M. J. Corley, and L. C. Ndhlovu, “A machine learning approach utilizing DNA methylation as an accurate classifier of COVID-19 disease severity,” Scientific Reports, vol. 12, iss. 1, p. 17480–, 2022. doi:10.1038/s41598-022-22201-4
[BibTeX] [Abstract] [Download PDF]
Since the onset of the COVID-19 pandemic, increasing cases with variable outcomes continue globally because of variants and despite vaccines and therapies. There is a need to identify at-risk individuals early that would benefit from timely medical interventions. DNA methylation provides an opportunity to identify an epigenetic signature of individuals at increased risk. We utilized machine learning to identify DNA methylation signatures of COVID-19 disease from data available through NCBI Gene Expression Omnibus. A training cohort of 460 individuals (164 COVID-19-infected and 296 non-infected) and an external validation dataset of 128 individuals (102 COVID-19-infected and 26 non-COVID-associated pneumonia) were reanalyzed. Data was processed using ChAMP and beta values were logit transformed. The JADBio AutoML platform was leveraged to identify a methylation signature associated with severe COVID-19 disease. We identified a random forest classification model from 4 unique methylation sites with the power to discern individuals with severe COVID-19 disease. The average area under the curve of receiver operator characteristic (AUC-ROC) of the model was 0.933 and the average area under the precision-recall curve (AUC-PRC) was 0.965. When applied to our external validation, this model produced an AUC-ROC of 0.898 and an AUC-PRC of 0.864. These results further our understanding of the utility of DNA methylation in COVID-19 disease pathology and serve as a platform to inform future COVID-19 related studies.
@article{bowler2022machine, abstract = {Since the onset of the COVID-19 pandemic, increasing cases with variable outcomes continue globally because of variants and despite vaccines and therapies. There is a need to identify at-risk individuals early that would benefit from timely medical interventions. DNA methylation provides an opportunity to identify an epigenetic signature of individuals at increased risk. We utilized machine learning to identify DNA methylation signatures of COVID-19 disease from data available through NCBI Gene Expression Omnibus. A training cohort of 460 individuals (164 COVID-19-infected and 296 non-infected) and an external validation dataset of 128 individuals (102 COVID-19-infected and 26 non-COVID-associated pneumonia) were reanalyzed. Data was processed using ChAMP and beta values were logit transformed. The JADBio AutoML platform was leveraged to identify a methylation signature associated with severe COVID-19 disease. We identified a random forest classification model from 4 unique methylation sites with the power to discern individuals with severe COVID-19 disease. The average area under the curve of receiver operator characteristic (AUC-ROC) of the model was 0.933 and the average area under the precision-recall curve (AUC-PRC) was 0.965. When applied to our external validation, this model produced an AUC-ROC of 0.898 and an AUC-PRC of 0.864. These results further our understanding of the utility of DNA methylation in COVID-19 disease pathology and serve as a platform to inform future COVID-19 related studies.}, added-at = {2023-03-07T22:52:39.000+0100}, author = {Bowler, Scott and Papoutsoglou, Georgios and Karanikas, Aristides and Tsamardinos, Ioannis and Corley, Michael J. and Ndhlovu, Lishomwa C.}, biburl = {https://www.bibsonomy.org/bibtex/224959130925e38210da9cab651bbaaaf/mensxmachina}, doi = {10.1038/s41598-022-22201-4}, interhash = {c95ccd60f041a590226ac5efad7c573c}, intrahash = {24959130925e38210da9cab651bbaaaf}, issn = {20452322}, journal = {Scientific Reports}, keywords = {DNA covid learning machine}, number = 1, pages = {17480--}, refid = {Bowler2022}, timestamp = {2023-03-07T22:52:39.000+0100}, title = {A machine learning approach utilizing DNA methylation as an accurate classifier of COVID-19 disease severity}, url = {https://doi.org/10.1038/s41598-022-22201-4}, volume = 12, year = 2022 }
- M. Karaglani, M. Panagopoulou, C. Cheimonidi, I. Tsamardinos, E. Maltezos, N. Papanas, D. Papazoglou, G. Mastorakos, and E. Chatzaki, “Liquid Biopsy in Type 2 Diabetes Mellitus Management: Building Specific Biosignatures via Machine Learning,” Journal of Clinical Medicine, vol. 11, iss. 4, 2022. doi:10.3390/jcm11041045
[BibTeX] [Abstract] [Download PDF]
Background: The need for minimally invasive biomarkers for the early diagnosis of type 2 diabetes (T2DM) prior to the clinical onset and monitoring of β-pancreatic cell loss is emerging. Here, we focused on studying circulating cell-free DNA (ccfDNA) as a liquid biopsy biomaterial for accurate diagnosis/monitoring of T2DM. Methods: ccfDNA levels were directly quantified in sera from 96 T2DM patients and 71 healthy individuals via fluorometry, and then fragment DNA size profiling was performed by capillary electrophoresis. Following this, ccfDNA methylation levels of five β-cell-related genes were measured via qPCR. Data were analyzed by automated machine learning to build classifying predictive models. Results: ccfDNA levels were found to be similar between groups but indicative of apoptosis in T2DM. INS (Insulin), IAPP (Islet Amyloid Polypeptide-Amylin), GCK (Glucokinase), and KCNJ11 (Potassium Inwardly Rectifying Channel Subfamily J member 11) levels differed significantly between groups. AutoML analysis delivered biosignatures including GCK, IAPP and KCNJ11 methylation, with the highest ever reported discriminating performance of T2DM from healthy individuals (AUC 0.927). Conclusions: Our data unravel the value of ccfDNA as a minimally invasive biomaterial carrying important clinical information for T2DM. Upon prospective clinical evaluation, the built biosignature can be disruptive for T2DM clinical management.
@article{jcm11041045, abstract = {Background: The need for minimally invasive biomarkers for the early diagnosis of type 2 diabetes (T2DM) prior to the clinical onset and monitoring of β-pancreatic cell loss is emerging. Here, we focused on studying circulating cell-free DNA (ccfDNA) as a liquid biopsy biomaterial for accurate diagnosis/monitoring of T2DM. Methods: ccfDNA levels were directly quantified in sera from 96 T2DM patients and 71 healthy individuals via fluorometry, and then fragment DNA size profiling was performed by capillary electrophoresis. Following this, ccfDNA methylation levels of five β-cell-related genes were measured via qPCR. Data were analyzed by automated machine learning to build classifying predictive models. Results: ccfDNA levels were found to be similar between groups but indicative of apoptosis in T2DM. INS (Insulin), IAPP (Islet Amyloid Polypeptide-Amylin), GCK (Glucokinase), and KCNJ11 (Potassium Inwardly Rectifying Channel Subfamily J member 11) levels differed significantly between groups. AutoML analysis delivered biosignatures including GCK, IAPP and KCNJ11 methylation, with the highest ever reported discriminating performance of T2DM from healthy individuals (AUC 0.927). Conclusions: Our data unravel the value of ccfDNA as a minimally invasive biomaterial carrying important clinical information for T2DM. Upon prospective clinical evaluation, the built biosignature can be disruptive for T2DM clinical management.}, added-at = {2022-06-22T10:51:41.000+0200}, article-number = {1045}, author = {Karaglani, Makrina and Panagopoulou, Maria and Cheimonidi, Christina and Tsamardinos, Ioannis and Maltezos, Efstratios and Papanas, Nikolaos and Papazoglou, Dimitrios and Mastorakos, George and Chatzaki, Ekaterini}, biburl = {https://www.bibsonomy.org/bibtex/2fa7bb5fb798e4e91d2532d3115dcbbef/mensxmachina}, doi = {10.3390/jcm11041045}, interhash = {f3820dbe8f6b53a53f1671c62d64dfaf}, intrahash = {fa7bb5fb798e4e91d2532d3115dcbbef}, issn = {2077-0383}, journal = {Journal of Clinical Medicine}, keywords = {biopsy diabetes learning machine mellitus}, number = 4, pubmedid = {35207316}, timestamp = {2022-06-22T10:51:41.000+0200}, title = {Liquid Biopsy in Type 2 Diabetes Mellitus Management: Building Specific Biosignatures via Machine Learning}, url = {https://www.mdpi.com/2077-0383/11/4/1045}, volume = 11, year = 2022 }
- J. L. Marshall, B. N. Peshkin, T. Yoshino, J. Vowinckel, H. E. Danielsen, G. Melino, I. Tsamardinos, C. Haudenschild, D. J. Kerr, C. Sampaio, S. Y. Rha, K. T. FitzGerald, E. C. Holland, D. Gallagher, J. Garcia-Foncillas, and H. Juhl, “The Essentials of Multiomics,” The Oncologist, vol. 27, iss. 4, pp. 272-284, 2022. doi:10.1093/oncolo/oyab048
[BibTeX] [Abstract] [Download PDF]
Within the last decade, the science of molecular testing has evolved from single gene and single protein analysis to broad molecular profiling as a standard of care, quickly transitioning from research to practice. Terms such as genomics, transcriptomics, proteomics, circulating omics, and artificial intelligence are now commonplace, and this rapid evolution has left us with a significant knowledge gap within the medical community. In this paper, we attempt to bridge that gap and prepare the physician in oncology for multiomics, a group of technologies that have gone from looming on the horizon to become a clinical reality. The era of multiomics is here, and we must prepare ourselves for this exciting new age of cancer medicine.
@article{10.1093/oncolo/oyab048, abstract = {{Within the last decade, the science of molecular testing has evolved from single gene and single protein analysis to broad molecular profiling as a standard of care, quickly transitioning from research to practice. Terms such as genomics, transcriptomics, proteomics, circulating omics, and artificial intelligence are now commonplace, and this rapid evolution has left us with a significant knowledge gap within the medical community. In this paper, we attempt to bridge that gap and prepare the physician in oncology for multiomics, a group of technologies that have gone from looming on the horizon to become a clinical reality. The era of multiomics is here, and we must prepare ourselves for this exciting new age of cancer medicine.}}, added-at = {2022-06-22T10:50:12.000+0200}, author = {Marshall, John L and Peshkin, Beth N and Yoshino, Takayuki and Vowinckel, Jakob and Danielsen, Håvard E and Melino, Gerry and Tsamardinos, Ioannis and Haudenschild, Christian and Kerr, David J and Sampaio, Carlos and Rha, Sun Young and FitzGerald, Kevin T and Holland, Eric C and Gallagher, David and Garcia-Foncillas, Jesus and Juhl, Hartmut}, biburl = {https://www.bibsonomy.org/bibtex/24d888d87a990372de0d0a08a01774ad6/mensxmachina}, doi = {10.1093/oncolo/oyab048}, eprint = {https://academic.oup.com/oncolo/article-pdf/27/4/272/43287416/oyab048.pdf}, interhash = {f0ee8d8b0e2acf63c050b1f6f58be762}, intrahash = {4d888d87a990372de0d0a08a01774ad6}, issn = {1083-7159}, journal = {The Oncologist}, keywords = {mensxmachina multi-omics}, month = {02}, number = 4, pages = {272-284}, timestamp = {2022-06-22T10:50:12.000+0200}, title = {{The Essentials of Multiomics}}, url = {https://doi.org/10.1093/oncolo/oyab048}, volume = 27, year = 2022 }
2021
- J. Marcos-Zambrano, K. Karaduzovic-Hadziabdic, T. Turukalo, P. Przymus, V. Trajkovik, O. Aasmets, M. Berland, G. Gruca, J. Hasic, K. Hron, T. Klammsteiner, M. Kolev, L. Lanthi, M. Lopez, V. Moreno, I. Naskinova, E. Org, I. Paciência, G. Papoutsoglou, R. Shigdel, B. Stres, B. Vilne, M. Yousef, E. Zdravevski, I. Tsamardinos, E. Carrillo de Santa Pau, M. Claesson, I. Moreno-Indias, and J. Truu, “Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment,” Frontiers in Microbiology , vol. 12, 2021 . doi:https://doi.org/10.3389/fmicb.2021.634511
[BibTeX] [Abstract] [Download PDF]
The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.
@article{noauthororeditor, abstract = {The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.}, added-at = {2021-02-25T10:36:00.000+0100}, author = {Marcos-Zambrano, J and Karaduzovic-Hadziabdic, K and Turukalo, TL and Przymus, P and Trajkovik, V and Aasmets, O and Berland, M and Gruca, G and Hasic, J and Hron, K and Klammsteiner, T and Kolev, M and Lanthi, L and Lopez, M and Moreno, V and Naskinova, I and Org, E and Paciência, I and Papoutsoglou, G and Shigdel, R and Stres, B and Vilne, B and Yousef, M and Zdravevski, E and Tsamardinos, I and Carrillo de Santa Pau, E and Claesson, M and Moreno-Indias, I and Truu, J}, biburl = {https://www.bibsonomy.org/bibtex/2e4c40be94c0336da43bf409d6a1272a7/mensxmachina}, doi = {https://doi.org/10.3389/fmicb.2021.634511}, interhash = {4f472a04bb70097a1db5243fc5c2ba8d}, intrahash = {e4c40be94c0336da43bf409d6a1272a7}, journal = {Frontiers in Microbiology }, keywords = {ML}, timestamp = {2021-02-25T10:36:00.000+0100}, title = {Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment}, url = {https://www.frontiersin.org/articles/10.3389/fmicb.2021.634511/full}, volume = 12, year = {2021 } }
2021
- L. J. Marcos-Zambrano, K. Karaduzovic-Hadziabdic, T. Loncar Turukalo, P. Przymus, V. Trajkovik, O. Aasmets, M. Berland, A. Gruca, J. Hasic, K. Hron, T. Klammsteiner, M. Kolev, L. Lahti, M. B. Lopes, V. Moreno, I. Naskinova, E. Org, I. Paciência, G. Papoutsoglou, R. Shigdel, B. Stres, B. Vilne, M. Yousef, E. Zdravevski, I. Tsamardinos, E. Carrillo de Santa Pau, M. J. Claesson, I. Moreno-Indias, and J. Truu, “Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment,” Frontiers in Microbiology, vol. 12, 2021. doi:10.3389/fmicb.2021.634511
[BibTeX] [Abstract] [Download PDF]
The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.
@article{10.3389/fmicb.2021.634511, abstract = {The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.}, added-at = {2022-06-22T10:58:03.000+0200}, author = {Marcos-Zambrano, Laura Judith and Karaduzovic-Hadziabdic, Kanita and Loncar Turukalo, Tatjana and Przymus, Piotr and Trajkovik, Vladimir and Aasmets, Oliver and Berland, Magali and Gruca, Aleksandra and Hasic, Jasminka and Hron, Karel and Klammsteiner, Thomas and Kolev, Mikhail and Lahti, Leo and Lopes, Marta B. and Moreno, Victor and Naskinova, Irina and Org, Elin and Paciência, Inês and Papoutsoglou, Georgios and Shigdel, Rajesh and Stres, Blaz and Vilne, Baiba and Yousef, Malik and Zdravevski, Eftim and Tsamardinos, Ioannis and Carrillo de Santa Pau, Enrique and Claesson, Marcus J. and Moreno-Indias, Isabel and Truu, Jaak}, biburl = {https://www.bibsonomy.org/bibtex/2b27cd61df0c85a21e0dd04b0fc7dfc6e/mensxmachina}, doi = {10.3389/fmicb.2021.634511}, interhash = {9365312756fb3fb9714d2f38a30626eb}, intrahash = {b27cd61df0c85a21e0dd04b0fc7dfc6e}, issn = {1664-302X}, journal = {Frontiers in Microbiology}, keywords = {applications biomarker disease learning machine microbiome predictive}, timestamp = {2022-06-22T10:58:03.000+0200}, title = {Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment}, url = {https://www.frontiersin.org/article/10.3389/fmicb.2021.634511}, volume = 12, year = 2021 }
- G. Papoutsoglou, M. Karaglani, V. Lagani, N. Thomson, O. Røe, I. Tsamardinos, and E. Chatzaki, “Automated machine learning optimizes and accelerates predictive modeling from COVID-19 high throughput datasets,” Scientific Reports, vol. 11, 2021. doi:10.1038/s41598-021-94501-0
[BibTeX]@article{article, added-at = {2022-06-22T10:56:54.000+0200}, author = {Papoutsoglou, Georgios and Karaglani, Makrina and Lagani, Vincenzo and Thomson, Naomi and Røe, Oluf and Tsamardinos, Ioannis and Chatzaki, Ekaterini}, biburl = {https://www.bibsonomy.org/bibtex/232ca8367a87572429ee46be29bae66af/mensxmachina}, doi = {10.1038/s41598-021-94501-0}, interhash = {29657a11a3631c2933d03c8939af5f29}, intrahash = {32ca8367a87572429ee46be29bae66af}, journal = {Scientific Reports}, keywords = {automl learning machine predictive}, month = {07}, timestamp = {2022-06-22T10:56:54.000+0200}, title = {Automated machine learning optimizes and accelerates predictive modeling from COVID-19 high throughput datasets}, volume = 11, year = 2021 }
- M. Papadogiorgaki, M. Venianaki, P. Charonyktakis, M. Antonakakis, I. Tsamardinos, M. E. Zervakis, and V. Sakkalis, “Heart Rate Classification Using ECG Signal Processing and Machine Learning Methods,” in 2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE), 2021, pp. 1-6. doi:10.1109/BIBE52308.2021.9635462
[BibTeX]@inproceedings{9635462, added-at = {2022-06-22T10:55:58.000+0200}, author = {Papadogiorgaki, Maria and Venianaki, Maria and Charonyktakis, Paulos and Antonakakis, Marios and Tsamardinos, Ioannis and Zervakis, Michalis E. and Sakkalis, Vangelis}, biburl = {https://www.bibsonomy.org/bibtex/22feae72b255e7875c2643efa7e6ed788/mensxmachina}, booktitle = {2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)}, doi = {10.1109/BIBE52308.2021.9635462}, interhash = {2eb66ce826c06fda1d9831644de642b5}, intrahash = {2feae72b255e7875c2643efa7e6ed788}, keywords = {classification ecg heart processing rate signal}, pages = {1-6}, timestamp = {2022-06-22T10:55:58.000+0200}, title = {Heart Rate Classification Using ECG Signal Processing and Machine Learning Methods}, year = 2021 }
- K. Rounis, D. Makrakis, C. Papadaki, A. Monastirioti, L. Vamvakas, K. Kalbakis, K. Gourlia, I. Xanthopoulos, I. Tsamardinos, D. Mavroudis, and S. Agelaki, “Prediction of outcome in patients with non-small cell lung cancer treated with second line PD-1/PDL-1 inhibitors based on clinical parameters: Results from a prospective, single institution study,” PLOS ONE, vol. 16, iss. 6, pp. 1-18, 2021. doi:10.1371/journal.pone.0252537
[BibTeX] [Abstract] [Download PDF]
Objective We prospectively recorded clinical and laboratory parameters from patients with metastatic non-small cell lung cancer (NSCLC) treated with 2nd line PD-1/PD-L1 inhibitors in order to address their effect on treatment outcomes. Materials and methods Clinicopathological information (age, performance status, smoking, body mass index, histology, organs with metastases), use and duration of proton pump inhibitors, steroids and antibiotics (ATB) and laboratory values [neutrophil/lymphocyte ratio, LDH, albumin] were prospectively collected. Steroid administration was defined as the use of > 10 mg prednisone equivalent for ≥ 10 days. Prolonged ATB administration was defined as ATB ≥ 14 days 30 days before or within the first 3 months of treatment. JADBio, a machine learning pipeline was applied for further multivariate analysis. Results Data from 66 pts with non-oncogenic driven metastatic NSCLC were analyzed; 15.2% experienced partial response (PR), 34.8% stable disease (SD) and 50% progressive disease (PD). Median overall survival (OS) was 6.77 months. ATB administration did not affect patient OS [HR = 1.35 (CI: 0.761–2.406, p = 0.304)], however, prolonged ATBs [HR = 2.95 (CI: 1.62–5.36, p = 0.0001)] and the presence of bone metastases [HR = 1.89 (CI: 1.02–3.51, p = 0.049)] independently predicted for shorter survival. Prolonged ATB administration, bone metastases, liver metastases and BMI < 25 kg/m2 were selected by JADbio as the important features that were associated with increased probability of developing disease progression as response to treatment. The resulting algorithm that was created was able to predict the probability of disease stabilization (PR or SD) in a single individual with an AUC = 0.806 [95% CI:0.714–0.889]. Conclusions Our results demonstrate an adverse effect of prolonged ATBs on response and survival and underscore their importance along with the presence of bone metastases, liver metastases and low BMI in the individual prediction of outcomes in patients treated with immunotherapy.
@article{10.1371/journal.pone.0252537, abstract = {Objective We prospectively recorded clinical and laboratory parameters from patients with metastatic non-small cell lung cancer (NSCLC) treated with 2nd line PD-1/PD-L1 inhibitors in order to address their effect on treatment outcomes. Materials and methods Clinicopathological information (age, performance status, smoking, body mass index, histology, organs with metastases), use and duration of proton pump inhibitors, steroids and antibiotics (ATB) and laboratory values [neutrophil/lymphocyte ratio, LDH, albumin] were prospectively collected. Steroid administration was defined as the use of > 10 mg prednisone equivalent for ≥ 10 days. Prolonged ATB administration was defined as ATB ≥ 14 days 30 days before or within the first 3 months of treatment. JADBio, a machine learning pipeline was applied for further multivariate analysis. Results Data from 66 pts with non-oncogenic driven metastatic NSCLC were analyzed; 15.2% experienced partial response (PR), 34.8% stable disease (SD) and 50% progressive disease (PD). Median overall survival (OS) was 6.77 months. ATB administration did not affect patient OS [HR = 1.35 (CI: 0.761–2.406, p = 0.304)], however, prolonged ATBs [HR = 2.95 (CI: 1.62–5.36, p = 0.0001)] and the presence of bone metastases [HR = 1.89 (CI: 1.02–3.51, p = 0.049)] independently predicted for shorter survival. Prolonged ATB administration, bone metastases, liver metastases and BMI < 25 kg/m2 were selected by JADbio as the important features that were associated with increased probability of developing disease progression as response to treatment. The resulting algorithm that was created was able to predict the probability of disease stabilization (PR or SD) in a single individual with an AUC = 0.806 [95% CI:0.714–0.889]. Conclusions Our results demonstrate an adverse effect of prolonged ATBs on response and survival and underscore their importance along with the presence of bone metastases, liver metastases and low BMI in the individual prediction of outcomes in patients treated with immunotherapy.}, added-at = {2021-06-04T09:18:00.000+0200}, author = {Rounis, Konstantinos and Makrakis, Dimitrios and Papadaki, Chara and Monastirioti, Alexia and Vamvakas, Lambros and Kalbakis, Konstantinos and Gourlia, Krystallia and Xanthopoulos, Iordanis and Tsamardinos, Ioannis and Mavroudis, Dimitrios and Agelaki, Sofia}, biburl = {https://www.bibsonomy.org/bibtex/2a0fda17bd6c2177cb4ce435c3559b648/mensxmachina}, doi = {10.1371/journal.pone.0252537}, interhash = {c53c8616653bdaaa2984cde14d27d241}, intrahash = {a0fda17bd6c2177cb4ce435c3559b648}, journal = {PLOS ONE}, keywords = {imported}, month = {06}, number = 6, pages = {1-18}, publisher = {Public Library of Science}, timestamp = {2021-06-04T09:18:00.000+0200}, title = {Prediction of outcome in patients with non-small cell lung cancer treated with second line PD-1/PDL-1 inhibitors based on clinical parameters: Results from a prospective, single institution study}, url = {https://doi.org/10.1371/journal.pone.0252537}, volume = 16, year = 2021 }
- G. Borboudakis and I. Tsamardinos, "Extending greedy feature selection algorithms to multiple solutions," Data Mining and Knowledge Discovery, 2021. doi:10.1007/s10618-020-00731-7
[BibTeX] [Abstract] [Download PDF]
Most feature selection methods identify only a single solution. This is acceptable for predictive purposes, but is not sufficient for knowledge discovery if multiple solutions exist. We propose a strategy to extend a class of greedy methods to efficiently identify multiple solutions, and show under which conditions it identifies all solutions. We also introduce a taxonomy of features that takes the existence of multiple solutions into account. Furthermore, we explore different definitions of statistical equivalence of solutions, as well as methods for testing equivalence. A novel algorithm for compactly representing and visualizing multiple solutions is also introduced. In experiments we show that (a) the proposed algorithm is significantly more computationally efficient than the TIE* algorithm, the only alternative approach with similar theoretical guarantees, while identifying similar solutions to it, and (b) that the identified solutions have similar predictive performance.
@article{Borboudakis2021, abstract = {Most feature selection methods identify only a single solution. This is acceptable for predictive purposes, but is not sufficient for knowledge discovery if multiple solutions exist. We propose a strategy to extend a class of greedy methods to efficiently identify multiple solutions, and show under which conditions it identifies all solutions. We also introduce a taxonomy of features that takes the existence of multiple solutions into account. Furthermore, we explore different definitions of statistical equivalence of solutions, as well as methods for testing equivalence. A novel algorithm for compactly representing and visualizing multiple solutions is also introduced. In experiments we show that (a) the proposed algorithm is significantly more computationally efficient than the TIE* algorithm, the only alternative approach with similar theoretical guarantees, while identifying similar solutions to it, and (b) that the identified solutions have similar predictive performance.}, added-at = {2021-05-10T09:37:57.000+0200}, author = {Borboudakis, Giorgos and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/21a02e4b98901f0889375b61fbba306a2/mensxmachina}, day = 01, doi = {10.1007/s10618-020-00731-7}, interhash = {2111a54b383124f93dad8b9ebd26afb5}, intrahash = {1a02e4b98901f0889375b61fbba306a2}, issn = {1573-756X}, journal = {Data Mining and Knowledge Discovery}, keywords = {mxmcausalpath}, month = may, timestamp = {2021-05-10T09:37:57.000+0200}, title = {Extending greedy feature selection algorithms to multiple solutions}, url = {https://doi.org/10.1007/s10618-020-00731-7}, year = 2021 }
- M. Panagopoulou, M. Karaglani, V. G. Manolopoulos, I. Iliopoulos, I. Tsamardinos, and E. Chatzaki, "Deciphering the Methylation Landscape in Breast Cancer: Diagnostic and Prognostic Biosignatures through Automated Machine Learning," Cancers, vol. 13, iss. 7, p. 1677, 2021. doi:10.3390/cancers13071677
[BibTeX] [Abstract] [Download PDF]
DNA methylation plays an important role in breast cancer (BrCa) pathogenesis and could contribute to driving its personalized management. We performed a complete bioinformatic analysis in BrCa whole methylome datasets, analyzed using the Illumina methylation 450 bead-chip array. Differential methylation analysis vs. clinical end-points resulted in 11,176 to 27,786 differentially methylated genes (DMGs). Innovative automated machine learning (AutoML) was employed to construct signatures with translational value. Three highly performing and low-feature-number signatures were built: (1) A 5-gene signature discriminating BrCa patients from healthy individuals (area under the curve (AUC): 0.994 (0.982–1.000)). (2) A 3-gene signature identifying BrCa metastatic disease (AUC: 0.986 (0.921–1.000)). (3) Six equivalent 5-gene signatures diagnosing early disease (AUC: 0.973 (0.920–1.000)). Validation in independent patient groups verified performance. Bioinformatic tools for functional analysis and protein interaction prediction were also employed. All protein encoding features included in the signatures were associated with BrCa-related pathways. Functional analysis of DMGs highlighted the regulation of transcription as the main biological process, the nucleus as the main cellular component and transcription factor activity and sequence-specific DNA binding as the main molecular functions. Overall, three high-performance diagnostic/prognostic signatures were built and are readily available for improving BrCa precision management upon prospective clinical validation. Revisiting archived methylomes through novel bioinformatic approaches revealed significant clarifying knowledge for the contribution of gene methylation events in breast carcinogenesis.
@article{Panagopoulou_2021, abstract = {DNA methylation plays an important role in breast cancer (BrCa) pathogenesis and could contribute to driving its personalized management. We performed a complete bioinformatic analysis in BrCa whole methylome datasets, analyzed using the Illumina methylation 450 bead-chip array. Differential methylation analysis vs. clinical end-points resulted in 11,176 to 27,786 differentially methylated genes (DMGs). Innovative automated machine learning (AutoML) was employed to construct signatures with translational value. Three highly performing and low-feature-number signatures were built: (1) A 5-gene signature discriminating BrCa patients from healthy individuals (area under the curve (AUC): 0.994 (0.982–1.000)). (2) A 3-gene signature identifying BrCa metastatic disease (AUC: 0.986 (0.921–1.000)). (3) Six equivalent 5-gene signatures diagnosing early disease (AUC: 0.973 (0.920–1.000)). Validation in independent patient groups verified performance. Bioinformatic tools for functional analysis and protein interaction prediction were also employed. All protein encoding features included in the signatures were associated with BrCa-related pathways. Functional analysis of DMGs highlighted the regulation of transcription as the main biological process, the nucleus as the main cellular component and transcription factor activity and sequence-specific DNA binding as the main molecular functions. Overall, three high-performance diagnostic/prognostic signatures were built and are readily available for improving BrCa precision management upon prospective clinical validation. Revisiting archived methylomes through novel bioinformatic approaches revealed significant clarifying knowledge for the contribution of gene methylation events in breast carcinogenesis.}, added-at = {2021-04-05T10:25:29.000+0200}, author = {Panagopoulou, Maria and Karaglani, Makrina and Manolopoulos, Vangelis G. and Iliopoulos, Ioannis and Tsamardinos, Ioannis and Chatzaki, Ekaterini}, biburl = {https://www.bibsonomy.org/bibtex/25938c275248de01841423c461744c95c/mensxmachina}, doi = {10.3390/cancers13071677}, interhash = {9a46961bf0583786199d3b4d978bcb01}, intrahash = {5938c275248de01841423c461744c95c}, journal = {Cancers}, keywords = {imported}, month = apr, number = 7, pages = 1677, publisher = {{MDPI} {AG}}, timestamp = {2021-04-05T10:25:29.000+0200}, title = {Deciphering the Methylation Landscape in Breast Cancer: Diagnostic and Prognostic Biosignatures through Automated Machine Learning}, url = {https://doi.org/10.3390%2Fcancers13071677}, volume = 13, year = 2021 }
- G. Borboudakis and I. Tsamardinos, "Extending Greedy Feature Selection Algorithms to Multiple Solutions," Data Mining and Knowledge Discovery, vol. to appear , 2021.
[BibTeX]@article{borboudakis2021mining, added-at = {2021-03-17T12:12:52.000+0100}, author = {Borboudakis, G and Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/295b55379724af7ef52054e5a33fd4745/mensxmachina}, interhash = {2111a54b383124f93dad8b9ebd26afb5}, intrahash = {95b55379724af7ef52054e5a33fd4745}, journal = {Data Mining and Knowledge Discovery}, keywords = {mxmcausalpath}, timestamp = {2021-03-18T10:07:49.000+0100}, title = {Extending Greedy Feature Selection Algorithms to Multiple Solutions}, volume = {to appear }, year = 2021 }
- N. Myrtakis, I. Tsamardinos, and V. Christophides, "PROTEUS: Predictive Explanation of Anomalies,," , vol. IEEE 37th International Conference on Data Engineering (ICDE) 2021, 2021.
[BibTeX] [Abstract]
Numerous algorithms have been proposed for detecting anomalies (outliers, novelties) in an unsupervised manner. Unfortunately, it is not trivial, in general, to understand why a given sample (record) is labelled as an anomaly and thus diagnose its root causes. We propose the following reduced-dimensionality, surrogate model approach to explain detector decisions: approximate the detection model with another one that employs only a small subset of features. Subsequently, samples can be visualized in this low-dimensionality space for human understanding. To this end, we develop PROTEUS, an AutoML pipeline to produce the surrogate model, specifically designed for feature selection on imbalanced datasets. The PROTEUS surrogate model can not only explain the training data, but also the out-of-sample (unseen) data. In other words, PROTEUS produces predictive explanations by approximating the decision surface of an unsupervised detector. PROTEUS is designed to return an accurate estimate of out-of-sample predictive performance to serve as a metric of the quality of the approximation. Computational experiments confirm the efficacy of PROTEUS to produce predictive explanations for different families of detectors and to reliably estimate their predictive performance in unseen data. Unlike several ad-hoc feature importance methods, PROTEUS is robust to high-dimensional data.
@conference{myrtakis2021proteus, abstract = {Numerous algorithms have been proposed for detecting anomalies (outliers, novelties) in an unsupervised manner. Unfortunately, it is not trivial, in general, to understand why a given sample (record) is labelled as an anomaly and thus diagnose its root causes. We propose the following reduced-dimensionality, surrogate model approach to explain detector decisions: approximate the detection model with another one that employs only a small subset of features. Subsequently, samples can be visualized in this low-dimensionality space for human understanding. To this end, we develop PROTEUS, an AutoML pipeline to produce the surrogate model, specifically designed for feature selection on imbalanced datasets. The PROTEUS surrogate model can not only explain the training data, but also the out-of-sample (unseen) data. In other words, PROTEUS produces predictive explanations by approximating the decision surface of an unsupervised detector. PROTEUS is designed to return an accurate estimate of out-of-sample predictive performance to serve as a metric of the quality of the approximation. Computational experiments confirm the efficacy of PROTEUS to produce predictive explanations for different families of detectors and to reliably estimate their predictive performance in unseen data. Unlike several ad-hoc feature importance methods, PROTEUS is robust to high-dimensional data. }, added-at = {2021-02-10T09:57:44.000+0100}, author = {Myrtakis, N and Tsamardinos, I and Christophides, V}, biburl = {https://www.bibsonomy.org/bibtex/207bdf48e36b94f93849856e1a1ec258a/mensxmachina}, interhash = {1be3182c1d6928ec21142b5f18a6ea20}, intrahash = {07bdf48e36b94f93849856e1a1ec258a}, keywords = {anomalies}, timestamp = {2021-03-19T10:32:22.000+0100}, title = {"PROTEUS: Predictive Explanation of Anomalies,"}, volume = {IEEE 37th International Conference on Data Engineering (ICDE) 2021}, year = 2021 }
2020
- A. Tsourtis, Y. Pantazis, and I. Tsamardinos, "Inference of Stochastic Dynamical Systems from Cross-Sectional Population Data ," arXiv:2012.05055v1 [cs.LG] 9 Dec 2020, 2020. doi:arXiv:2012.05055v1 [cs.LG] 9 Dec 2020
[BibTeX] [Abstract]
Inferring the driving equations of a dynamical system from population or time-course data is important in several scientific fields such as biochemistry, epidemiology, financial mathematics and many others. Despite the existence of algorithms that learn the dynamics from trajectorial measurements there are few attempts to infer the dynamical system straight from population data. In this work, we deduce and then computationally estimate the Fokker-Planck equation which describes the evolution of the population’s probability density, based on stochastic differential equations. Then, following the USDL approach [22], we project the Fokker-Planck equation to a proper set of test functions, transforming it into a linear system of equations. Finally, we apply sparse inference methods to solve the latter system and thus induce the driving forces of the dynamical system. Our approach is illustrated in both synthetic and real data including non-linear, multimodal stochastic differential equations, biochemical reaction networks as well as mass cytometry biological measurements.
@article{tsourtis2020inference, abstract = {Inferring the driving equations of a dynamical system from population or time-course data is important in several scientific fields such as biochemistry, epidemiology, financial mathematics and many others. Despite the existence of algorithms that learn the dynamics from trajectorial measurements there are few attempts to infer the dynamical system straight from population data. In this work, we deduce and then computationally estimate the Fokker-Planck equation which describes the evolution of the population’s probability density, based on stochastic differential equations. Then, following the USDL approach [22], we project the Fokker-Planck equation to a proper set of test functions, transforming it into a linear system of equations. Finally, we apply sparse inference methods to solve the latter system and thus induce the driving forces of the dynamical system. Our approach is illustrated in both synthetic and real data including non-linear, multimodal stochastic differential equations, biochemical reaction networks as well as mass cytometry biological measurements.}, added-at = {2021-03-24T10:32:03.000+0100}, author = {Tsourtis, A and Pantazis, Y and Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/2f3d7571025e47ab9693c1b8a5876702d/mensxmachina}, doi = {arXiv:2012.05055v1 [cs.LG] 9 Dec 2020}, interhash = {1dd0cba1cddecc67bc714ff55e2fa939}, intrahash = {f3d7571025e47ab9693c1b8a5876702d}, journal = {arXiv:2012.05055v1 [cs.LG] 9 Dec 2020}, keywords = {mxmcausalpath}, timestamp = {2021-03-24T10:32:03.000+0100}, title = {Inference of Stochastic Dynamical Systems from Cross-Sectional Population Data }, year = 2020 }
- M. Tsagris, Z. Papadovasilakis, K. Lakiotaki, and I. Tsamardinos, "The γ-OMP algorithm for feature selection with application to gene expression data," IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2020. doi:10.1109/TCBB.2020.3029952
[BibTeX] [Abstract] [Download PDF]
Feature selection for predictive analytics is the problem of identifying a minimal-size subset of features that is maximally predictive of an outcome of interest. To apply to molecular data, feature selection algorithms need to be scalable to tens of thousands of features. In this paper, we propose γ-OMP, a generalisation of the highly-scalable Orthogonal Matching Pursuit feature selection algorithm. γ-OMP can handle (a) various types of outcomes, such as continuous, binary, nominal, time-to-event, (b) discrete (categorical) features, (c) different statistical-based stopping criteria, (d) several predictive models (e.g., linear or logistic regression), (e) various types of residuals, and (f) different types of association. We compare γ-OMP against LASSO, a prototypical, widely used algorithm for high-dimensional data. On both simulated data and several real gene expression datasets, γ-OMP is on par, or outperforms LASSO in binary classification (case-control data), regression (quantified outcomes), and time-to-event data (censored survival times). γ-OMP is based on simple statistical ideas, it is easy to implement and to extend, and our extensive evaluation shows that it is also effective in bioinformatics analysis settings.
@article{tsagris2020algorithm, abstract = {Feature selection for predictive analytics is the problem of identifying a minimal-size subset of features that is maximally predictive of an outcome of interest. To apply to molecular data, feature selection algorithms need to be scalable to tens of thousands of features. In this paper, we propose γ-OMP, a generalisation of the highly-scalable Orthogonal Matching Pursuit feature selection algorithm. γ-OMP can handle (a) various types of outcomes, such as continuous, binary, nominal, time-to-event, (b) discrete (categorical) features, (c) different statistical-based stopping criteria, (d) several predictive models (e.g., linear or logistic regression), (e) various types of residuals, and (f) different types of association. We compare γ-OMP against LASSO, a prototypical, widely used algorithm for high-dimensional data. On both simulated data and several real gene expression datasets, γ-OMP is on par, or outperforms LASSO in binary classification (case-control data), regression (quantified outcomes), and time-to-event data (censored survival times). γ-OMP is based on simple statistical ideas, it is easy to implement and to extend, and our extensive evaluation shows that it is also effective in bioinformatics analysis settings.}, added-at = {2021-03-22T13:27:44.000+0100}, author = {Tsagris, Michail and Papadovasilakis, Zacharias and Lakiotaki, Kleanthi and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/2372b4dd105cf55a3c32ca0d937888f2e/mensxmachina}, doi = {10.1109/TCBB.2020.3029952}, interhash = {9bef7f59658d9a4a2f82cba160e276e4}, intrahash = {372b4dd105cf55a3c32ca0d937888f2e}, journal = { IEEE/ACM Transactions on Computational Biology and Bioinformatics }, keywords = {mxmcausalpath}, timestamp = {2021-03-22T13:27:44.000+0100}, title = {The γ-OMP algorithm for feature selection with application to gene expression data}, url = {https://ieeexplore.ieee.org/document/9219177/authors#authors}, year = 2020 }
- Y. Pantazis, C. Tselas, K. Lakiotaki, V. Lagani, and ioannis Tsamardinos, "Latent Feature Representations for Human Gene Expression Data Improve Phenotypic Predictions," IEEE, 2020. doi:10.1109/BIBM49941.2020.9313286
[BibTeX] [Abstract] [Download PDF]
High-throughput technologies such as microarrays and RNA-sequencing (RNA-seq) allow to precisely quantify transcriptomic profiles, generating datasets that are inevitably high-dimensional. In this work, we investigate whether the whole human transcriptome can be represented in a compressed, low dimensional latent space without loosing relevant information. We thus constructed low-dimensional latent feature spaces of the human genome, by utilizing three dimensionality reduction approaches and a diverse set of curated datasets. We applied standard Principal Component Analysis (PCA), kernel PCA and Autoencoder Neural Networks on 1360 datasets from four different measurement technologies. The latent feature spaces are tested for their ability to (a) reconstruct the original data and (b) improve predictive performance on validation datasets not used during the creation of the feature space. While linear techniques show better reconstruction performance, nonlinear approaches, particularly, neural-based models seem to be able to capture non-additive interaction effects, and thus enjoy stronger predictive capabilities. Despite the limited sample size of each dataset and the biological / technological heterogeneity across studies, our results show that low dimensional representations of the human transcriptome can be achieved by integrating hundreds of datasets. The created space is two to three orders of magnitude smaller compared to the raw data, offering the ability of capturing a large portion of the original data variability and eventually reducing computational time for downstream analyses.
@article{pantazis2020latent, abstract = {High-throughput technologies such as microarrays and RNA-sequencing (RNA-seq) allow to precisely quantify transcriptomic profiles, generating datasets that are inevitably high-dimensional. In this work, we investigate whether the whole human transcriptome can be represented in a compressed, low dimensional latent space without loosing relevant information. We thus constructed low-dimensional latent feature spaces of the human genome, by utilizing three dimensionality reduction approaches and a diverse set of curated datasets. We applied standard Principal Component Analysis (PCA), kernel PCA and Autoencoder Neural Networks on 1360 datasets from four different measurement technologies. The latent feature spaces are tested for their ability to (a) reconstruct the original data and (b) improve predictive performance on validation datasets not used during the creation of the feature space. While linear techniques show better reconstruction performance, nonlinear approaches, particularly, neural-based models seem to be able to capture non-additive interaction effects, and thus enjoy stronger predictive capabilities. Despite the limited sample size of each dataset and the biological / technological heterogeneity across studies, our results show that low dimensional representations of the human transcriptome can be achieved by integrating hundreds of datasets. The created space is two to three orders of magnitude smaller compared to the raw data, offering the ability of capturing a large portion of the original data variability and eventually reducing computational time for downstream analyses.}, added-at = {2021-01-27T08:25:38.000+0100}, author = {Pantazis, Yannis and Tselas, Christos and Lakiotaki, Kleanthi and Lagani, Vincenzo and Tsamardinos, ioannis}, biburl = {https://www.bibsonomy.org/bibtex/22e00727d34af38370524ab45428d1935/mensxmachina}, doi = {10.1109/BIBM49941.2020.9313286}, interhash = {85456c0fc077102f3eca5cd7f7dfc749}, intrahash = {2e00727d34af38370524ab45428d1935}, journal = {IEEE}, keywords = {mxmcausalpath}, timestamp = {2021-03-08T12:07:50.000+0100}, title = {Latent Feature Representations for Human Gene Expression Data Improve Phenotypic Predictions}, url = {https://ieeexplore.ieee.org/document/9313286}, year = 2020 }
- N. Phanell, V. Lagani, P. Sebastian-Leon, F. Van der Kloet, E. Ewing, N. Karathanasis, A. Urdangarin, I. Arozarena, M. Jagodic, I. Tsamardinos, S. Tarazona, A. Conesa, J. Tegner, and D. Gomez-Cabrero, "STATegra: Multi-omics data integration - A conceptual scheme and a bioinformatics pipeline," Frontiers in Genetics , vol. to appear , 2020. doi:https://doi.org/10.1101/2020.11.20.391045
[BibTeX] [Abstract] [Download PDF]
Technologies for profiling samples using different omics platforms have been at the forefront since the human genome project. Large-scale multi-omics data hold the promise of deciphering different regulatory layers. Yet, while there is a myriad of bioinformatics tools, each multi-omics analysis appears to start from scratch with an arbitrary decision over which tools to use and how to combine them. It is therefore an unmet need to conceptualize how to integrate such data and to implement and validate pipelines in different cases. We have designed a conceptual framework (STATegra), aiming it to be as generic as possible for multi-omics analysis, combining machine learning component analysis, non-parametric data combination and a multi-omics exploratory analysis in a step-wise manner. While in several studies we have previously combined those integrative tools, here we provide a systematic description of the STATegra framework and its validation using two TCGA case studies. For both, the Glioblastoma and the Skin Cutaneous Melanoma cases, we demonstrate an enhanced capacity to identify features in comparison to single-omics analysis. Such an integrative multi-omics analysis framework for the identification of features and components facilitates the discovery of new biology. Finally, we provide several options for applying the STATegra framework when parametric assumptions are fulfilled, and for the case when not all the samples are profiled for all omics. The STATegra framework is built using several tools, which are being integrated step-by-step as OpenSource in the STATegRa Bioconductor package https://bioconductor.org/packages/release/bioc/html/STATegra.html.
@article{noauthororeditor, abstract = {Technologies for profiling samples using different omics platforms have been at the forefront since the human genome project. Large-scale multi-omics data hold the promise of deciphering different regulatory layers. Yet, while there is a myriad of bioinformatics tools, each multi-omics analysis appears to start from scratch with an arbitrary decision over which tools to use and how to combine them. It is therefore an unmet need to conceptualize how to integrate such data and to implement and validate pipelines in different cases. We have designed a conceptual framework (STATegra), aiming it to be as generic as possible for multi-omics analysis, combining machine learning component analysis, non-parametric data combination and a multi-omics exploratory analysis in a step-wise manner. While in several studies we have previously combined those integrative tools, here we provide a systematic description of the STATegra framework and its validation using two TCGA case studies. For both, the Glioblastoma and the Skin Cutaneous Melanoma cases, we demonstrate an enhanced capacity to identify features in comparison to single-omics analysis. Such an integrative multi-omics analysis framework for the identification of features and components facilitates the discovery of new biology. Finally, we provide several options for applying the STATegra framework when parametric assumptions are fulfilled, and for the case when not all the samples are profiled for all omics. The STATegra framework is built using several tools, which are being integrated step-by-step as OpenSource in the STATegRa Bioconductor package https://bioconductor.org/packages/release/bioc/html/STATegra.html.}, added-at = {2021-01-25T08:02:51.000+0100}, author = {Phanell, Nuria and Lagani, Vincenzo and Sebastian-Leon, Patricia and Van der Kloet, Frans and Ewing, Ewoud and Karathanasis, Nestoras and Urdangarin, Arantxa and Arozarena, Imanol and Jagodic, Maja and Tsamardinos, Ioannis and Tarazona, Sonia and Conesa, Ana and Tegner, Jesper and Gomez-Cabrero, David}, biburl = {https://www.bibsonomy.org/bibtex/213d5658c490ee48b134629c33979e700/mensxmachina}, doi = {https://doi.org/10.1101/2020.11.20.391045}, interhash = {84dd53162ecf2659ffb75f1329f0aaad}, intrahash = {13d5658c490ee48b134629c33979e700}, journal = {Frontiers in Genetics }, keywords = {data multi-omics}, timestamp = {2021-01-25T08:02:51.000+0100}, title = {STATegra: Multi-omics data integration - A conceptual scheme and a bioinformatics pipeline}, url = {https://www.biorxiv.org/content/10.1101/2020.11.20.391045v1}, volume = {to appear }, year = 2020 }
- K. Karstoft, I". "Tsamardinos, K". "Eskelund, "Andersen.SB", and L. "Nissen, "Applicability of an Automated Model and Parameter Selection in the Prediction of Screening-Level PTSD in Danish Soldiers Following Deployment: Development Study of Transferable Predictive Models Using Automated Machine Learning," JMIR Medical Informatics, vol. 8, iss. 7, 2020. doi:10.2196/17119
[BibTeX] [Abstract] [Download PDF]
Background: Posttraumatic stress disorder (PTSD) is a relatively common consequence of deployment to war zones. Early postdeployment screening with the aim of identifying those at risk for PTSD in the years following deployment will help deliver interventions to those in need but have so far proved unsuccessful. Objective: This study aimed to test the applicability of automated model selection and the ability of automated machine learning prediction models to transfer across cohorts and predict screening-level PTSD 2.5 years and 6.5 years after deployment. Methods: Automated machine learning was applied to data routinely collected 6-8 months after return from deployment from 3 different cohorts of Danish soldiers deployed to Afghanistan in 2009 (cohort 1, N=287 or N=261 depending on the timing of the outcome assessment), 2010 (cohort 2, N=352), and 2013 (cohort 3, N=232). Results: Models transferred well between cohorts. For screening-level PTSD 2.5 and 6.5 years after deployment, random forest models provided the highest accuracy as measured by area under the receiver operating characteristic curve (AUC): 2.5 years, AUC=0.77, 95% CI 0.71-0.83; 6.5 years, AUC=0.78, 95% CI 0.73-0.83. Linear models performed equally well. Military rank, hyperarousal symptoms, and total level of PTSD symptoms were highly predictive. Conclusions: Automated machine learning provided validated models that can be readily implemented in future deployment cohorts in the Danish Defense with the aim of targeting postdeployment support interventions to those at highest risk for developing PTSD, provided the cohorts are deployed on similar missions.
@article{karstoft2020applicability, abstract = {Background: Posttraumatic stress disorder (PTSD) is a relatively common consequence of deployment to war zones. Early postdeployment screening with the aim of identifying those at risk for PTSD in the years following deployment will help deliver interventions to those in need but have so far proved unsuccessful. Objective: This study aimed to test the applicability of automated model selection and the ability of automated machine learning prediction models to transfer across cohorts and predict screening-level PTSD 2.5 years and 6.5 years after deployment. Methods: Automated machine learning was applied to data routinely collected 6-8 months after return from deployment from 3 different cohorts of Danish soldiers deployed to Afghanistan in 2009 (cohort 1, N=287 or N=261 depending on the timing of the outcome assessment), 2010 (cohort 2, N=352), and 2013 (cohort 3, N=232). Results: Models transferred well between cohorts. For screening-level PTSD 2.5 and 6.5 years after deployment, random forest models provided the highest accuracy as measured by area under the receiver operating characteristic curve (AUC): 2.5 years, AUC=0.77, 95% CI 0.71-0.83; 6.5 years, AUC=0.78, 95% CI 0.73-0.83. Linear models performed equally well. Military rank, hyperarousal symptoms, and total level of PTSD symptoms were highly predictive. Conclusions: Automated machine learning provided validated models that can be readily implemented in future deployment cohorts in the Danish Defense with the aim of targeting postdeployment support interventions to those at highest risk for developing PTSD, provided the cohorts are deployed on similar missions.}, added-at = {2020-11-04T15:45:03.000+0100}, author = {"Karstoft, KI" and "Tsamardinos, I" and "Eskelund, K" and "Andersen.SB" and "Nissen, LR"}, biburl = {https://www.bibsonomy.org/bibtex/2b3c6a7c433dc0137e177a389e93373d6/mensxmachina}, doi = {10.2196/17119}, interhash = {e4d28b268e9ea645b86d7488930824cc}, intrahash = {b3c6a7c433dc0137e177a389e93373d6}, journal = {JMIR Medical Informatics}, keywords = {AutoML Automated Learning Machine application models parameter predictive selection study transferable}, month = {July}, number = 7, timestamp = {2020-11-04T15:46:26.000+0100}, title = {Applicability of an Automated Model and Parameter Selection in the Prediction of Screening-Level PTSD in Danish Soldiers Following Deployment: Development Study of Transferable Predictive Models Using Automated Machine Learning}, url = {https://europepmc.org/article/pmc/pmc7407253}, volume = 8, year = 2020 }
- M. Karaglani, K. Gourlia, I. Tsamardinos, and E. Chatzaki, "Accurate Blood-Based Diagnostic Biosignatures for Alzheimer’s Disease via Automated Machine Learning," Journal of Clinical Medicine , vol. 9, p. 3016, 2020. doi:10.3390/jcm9093016
[BibTeX] [Abstract] [Download PDF]
Alzheimer’s disease (AD) is the most common form of neurodegenerative dementia and its timely diagnosis remains a major challenge in biomarker discovery. In the present study, we analyzed publicly available high-throughput low-sample -omics datasets from studies in AD blood, by the AutoML technology Just Add Data Bio (JADBIO), to construct accurate predictive models for use as diagnostic biosignatures. Considering data from AD patients and age–sex matched cognitively healthy individuals, we produced three best performing diagnostic biosignatures specific for the presence of AD: A. A 506-feature transcriptomic dataset from 48 AD and 22 controls led to a miRNA-based biosignature via Support Vector Machines with three miRNA predictors (AUC 0.975 (0.906, 1.000)), B. A 38,327-feature transcriptomic dataset from 134 AD and 100 controls led to six mRNA-based statistically equivalent signatures via Classification Random Forests with 25 mRNA predictors (AUC 0.846 (0.778, 0.905)) and C. A 9483-feature proteomic dataset from 25 AD and 37 controls led to a protein-based biosignature via Ridge Logistic Regression with seven protein predictors (AUC 0.921 (0.849, 0.972)). These performance metrics were also validated through the JADBIO pipeline confirming stability. In conclusion, using the automated machine learning tool JADBIO, we produced accurate predictive biosignatures extrapolating available low sample -omics data. These results offer options for minimally invasive blood-based diagnostic tests for AD, awaiting clinical validation based on respective laboratory assays. They also highlight the value of AutoML in biomarker discovery
@article{karaglani2020accurate, abstract = {Alzheimer’s disease (AD) is the most common form of neurodegenerative dementia and its timely diagnosis remains a major challenge in biomarker discovery. In the present study, we analyzed publicly available high-throughput low-sample -omics datasets from studies in AD blood, by the AutoML technology Just Add Data Bio (JADBIO), to construct accurate predictive models for use as diagnostic biosignatures. Considering data from AD patients and age–sex matched cognitively healthy individuals, we produced three best performing diagnostic biosignatures specific for the presence of AD: A. A 506-feature transcriptomic dataset from 48 AD and 22 controls led to a miRNA-based biosignature via Support Vector Machines with three miRNA predictors (AUC 0.975 (0.906, 1.000)), B. A 38,327-feature transcriptomic dataset from 134 AD and 100 controls led to six mRNA-based statistically equivalent signatures via Classification Random Forests with 25 mRNA predictors (AUC 0.846 (0.778, 0.905)) and C. A 9483-feature proteomic dataset from 25 AD and 37 controls led to a protein-based biosignature via Ridge Logistic Regression with seven protein predictors (AUC 0.921 (0.849, 0.972)). These performance metrics were also validated through the JADBIO pipeline confirming stability. In conclusion, using the automated machine learning tool JADBIO, we produced accurate predictive biosignatures extrapolating available low sample -omics data. These results offer options for minimally invasive blood-based diagnostic tests for AD, awaiting clinical validation based on respective laboratory assays. They also highlight the value of AutoML in biomarker discovery}, added-at = {2020-09-21T09:50:10.000+0200}, author = {Karaglani, Makrina and Gourlia, Krystallia and Tsamardinos, Ioannis and Chatzaki, Ekaterini}, biburl = {https://www.bibsonomy.org/bibtex/2e4b7bd7e2db5b045f13fdb58de2c0b3c/mensxmachina}, doi = {10.3390/jcm9093016}, interhash = {9805a10159c6a371c134c15364c30b15}, intrahash = {e4b7bd7e2db5b045f13fdb58de2c0b3c}, journal = {Journal of Clinical Medicine }, keywords = {Alzheimer’s blood classifier disease learning machine model predictive}, pages = 3016, timestamp = {2021-03-18T08:42:28.000+0100}, title = {Accurate Blood-Based Diagnostic Biosignatures for Alzheimer’s Disease via Automated Machine Learning}, url = {https://www.mdpi.com/2077-0383/9/9/3016}, volume = 9, year = 2020 }
- K. Biza, I. Tsamardinos, and S. Triantafillou, "Tuning Causal Discovery Algorithms," Proceedings of the Tenth International Conference on Probabilistic Graphical Models, in PMLR, 2020.
[BibTeX] [Abstract] [Download PDF]
There are numerous algorithms proposed in the literature for learning causal graphical probabilistic models. Each one of them is typically equipped with one or more tuning hyper-parameters. The choice of optimal algorithm and hyper-parameter values is not universal; it depends on the size of the network, the density of the true causal structure, the sample size, as well as the metric of quality of learning a causal structure. Thus, the challenge to a practitioner is how to “tune” these choices, given that the true graph is unknown and the learning task is unsupervised. In the paper, we evaluate two previously proposed methods for tuning, one based on stability of the learned structure under perturbations (bootstrapping) of the input data and the other based on balancing the in-sample fitting of the model with the model complexity. We propose and comparatively evaluate a new method that treats a causal model as a set of predictive models: one for each node given its Markov Blanket. It then tunes the choices using out-of-sample protocols for supervised methods such as cross-validation. The proposed method performs on par or better than the previous methods for most metrics.
@article{noauthororeditor, abstract = {There are numerous algorithms proposed in the literature for learning causal graphical probabilistic models. Each one of them is typically equipped with one or more tuning hyper-parameters. The choice of optimal algorithm and hyper-parameter values is not universal; it depends on the size of the network, the density of the true causal structure, the sample size, as well as the metric of quality of learning a causal structure. Thus, the challenge to a practitioner is how to “tune” these choices, given that the true graph is unknown and the learning task is unsupervised. In the paper, we evaluate two previously proposed methods for tuning, one based on stability of the learned structure under perturbations (bootstrapping) of the input data and the other based on balancing the in-sample fitting of the model with the model complexity. We propose and comparatively evaluate a new method that treats a causal model as a set of predictive models: one for each node given its Markov Blanket. It then tunes the choices using out-of-sample protocols for supervised methods such as cross-validation. The proposed method performs on par or better than the previous methods for most metrics.}, added-at = {2020-09-08T10:21:24.000+0200}, author = {Biza, K. and Tsamardinos, I. and Triantafillou, S.}, biburl = {https://www.bibsonomy.org/bibtex/227a19344432d6e2831ae6fac806fe077/mensxmachina}, interhash = {8a79a42946abf9f62c4d45b8110b6a94}, intrahash = {27a19344432d6e2831ae6fac806fe077}, journal = {Proceedings of the Tenth International Conference on Probabilistic Graphical Models, in PMLR}, keywords = {mxmcausalpath}, timestamp = {2021-03-08T12:02:38.000+0100}, title = {Tuning Causal Discovery Algorithms}, url = {https://pgm2020.cs.aau.dk/wp-content/uploads/2020/09/biza20.pdf}, year = 2020 }
- I. Karagiannaki, Y. Pantazis, E. Chatzaki, and I. Tsamardinos, "Pathway Activity Score Learning for Dimensionality Reduction of Gene Expression Data," Discovery Science. DS 2020. Lecture Notes in Computer Science, vol. 12323, pp. 246-261, 2020. doi:https://doi.org/10.1007/978-3-030-61527-7_17
[BibTeX] [Abstract] [Download PDF]
Molecular gene-expression datasets consist of samples with tens of thousands of measured quantities (e.g., high dimensional data). However, there exist lower-dimensional representations that retain the useful information. We present a novel algorithm for such dimensionality reduction called Pathway Activity Score Learning (PASL). The major novelty of PASL is that the constructed features directly correspond to known molecular pathways and can be interpreted as pathway activity scores. Hence, unlike PCA and similar methods, PASL’s latent space has a relatively straight-forward biological interpretation. As a use-case, PASL is applied on two collections of breast cancer and leukemia gene expression datasets. We show that PASL does retain the predictive information for disease classification on new, unseen datasets, as well as outperforming PLIER, a recently proposed competitive method. We also show that differential activation pathway analysis provides complementary information to standard gene set enrichment analysis. The code is available at https://github.com/mensxmachina/PASL.
@article{noauthororeditor, abstract = {Molecular gene-expression datasets consist of samples with tens of thousands of measured quantities (e.g., high dimensional data). However, there exist lower-dimensional representations that retain the useful information. We present a novel algorithm for such dimensionality reduction called Pathway Activity Score Learning (PASL). The major novelty of PASL is that the constructed features directly correspond to known molecular pathways and can be interpreted as pathway activity scores. Hence, unlike PCA and similar methods, PASL’s latent space has a relatively straight-forward biological interpretation. As a use-case, PASL is applied on two collections of breast cancer and leukemia gene expression datasets. We show that PASL does retain the predictive information for disease classification on new, unseen datasets, as well as outperforming PLIER, a recently proposed competitive method. We also show that differential activation pathway analysis provides complementary information to standard gene set enrichment analysis. The code is available at https://github.com/mensxmachina/PASL.}, added-at = {2020-09-07T12:31:50.000+0200}, author = {Karagiannaki, Ioulia and Pantazis, Yannis and Chatzaki, Ekaterini and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/25aa1c97303026e34c4c5d8a76d116652/mensxmachina}, doi = {https://doi.org/10.1007/978-3-030-61527-7_17}, editor = {"Tsoumakas, G" and "Manolopoulos, Y" and "Matwin, S"}, interhash = {250e1c55d999f5493581587cf0627a28}, intrahash = {5aa1c97303026e34c4c5d8a76d116652}, journal = {Discovery Science. DS 2020. Lecture Notes in Computer Science}, keywords = {mxmcausalpath}, pages = {246-261}, timestamp = {2021-03-18T08:43:09.000+0100}, title = {Pathway Activity Score Learning for Dimensionality Reduction of Gene Expression Data}, url = {https://link.springer.com/chapter/10.1007%2F978-3-030-61527-7_17}, volume = 12323, year = 2020 }
- I. Tsamardinos, G. Fanourgakis, E. Greasidou, E. Klontzas, K. Gkagkas, and G. Froudakis, "An Automated Machine Learning architecture for the accelerated prediction of Metal-Organic Frameworks performance in energy and environmental applications," Microporous and Mesoporos Materials , vol. 300, 2020. doi:https://doi.org/10.1016/j.micromeso.2020.110160
[BibTeX] [Abstract] [Download PDF]
Due to their exceptional host-guest properties, Metal-Organic Frameworks (MOFs) are promising materials for storage of various gases with environmental and technological interest. Molecular modeling and simulations are invaluable tools, extensively used over the last two decades for the study of various properties of MOFs. In particular, Monte Carlo simulation techniques have been employed for the study of the gas uptake capacity of several MOFs at a wide range of different thermodynamic conditions. Despite the accurate predictions of molecular simulations, the accurate characterization and the high-throughput screening of the enormous number of MOFs that can be potentially synthesized by combining various structural building blocks is beyond present computer capabilities. In this work, we propose and demonstrate the use of an alternative approach, namely one based on an Automated Machine Learning (AutoML) architecture that is capable of training machine learning and statistical predictive models for MOFs’ chemical properties and estimate their predictive performance with confidence intervals. The architecture tries numerous combinations of different machine learning (ML) algorithms, tunes their hyper-parameters, and conservatively estimates performance of the final model. We demonstrate that it correctly estimates performance even with few samples (<100) and that it provides improved predictions over trying a single standard method, like Random Forests. The AutoML pipeline democratizes ML to non-expert material-science practitioners that may not know which algorithms to use on a given problem, how to tune them, and how to correctly estimate their predictive performance, dramatically improving productivity and avoiding common analysis pitfalls. A demonstration on the prediction of the carbon dioxide and methane uptake at various thermodynamic conditions is used as a showcase sharable at https://app.jadbio.com/share/86477fd7-d467-464d-ac41-fcbb0475444b.
@article{noauthororeditor, abstract = {Due to their exceptional host-guest properties, Metal-Organic Frameworks (MOFs) are promising materials for storage of various gases with environmental and technological interest. Molecular modeling and simulations are invaluable tools, extensively used over the last two decades for the study of various properties of MOFs. In particular, Monte Carlo simulation techniques have been employed for the study of the gas uptake capacity of several MOFs at a wide range of different thermodynamic conditions. Despite the accurate predictions of molecular simulations, the accurate characterization and the high-throughput screening of the enormous number of MOFs that can be potentially synthesized by combining various structural building blocks is beyond present computer capabilities. In this work, we propose and demonstrate the use of an alternative approach, namely one based on an Automated Machine Learning (AutoML) architecture that is capable of training machine learning and statistical predictive models for MOFs’ chemical properties and estimate their predictive performance with confidence intervals. The architecture tries numerous combinations of different machine learning (ML) algorithms, tunes their hyper-parameters, and conservatively estimates performance of the final model. We demonstrate that it correctly estimates performance even with few samples (<100) and that it provides improved predictions over trying a single standard method, like Random Forests. The AutoML pipeline democratizes ML to non-expert material-science practitioners that may not know which algorithms to use on a given problem, how to tune them, and how to correctly estimate their predictive performance, dramatically improving productivity and avoiding common analysis pitfalls. A demonstration on the prediction of the carbon dioxide and methane uptake at various thermodynamic conditions is used as a showcase sharable at https://app.jadbio.com/share/86477fd7-d467-464d-ac41-fcbb0475444b.}, added-at = {2020-04-15T10:00:20.000+0200}, author = {Tsamardinos, Ioannis and Fanourgakis, George and Greasidou, Elissavet and Klontzas, Emmanuel and Gkagkas, Konstantinos and Froudakis, George}, biburl = {https://www.bibsonomy.org/bibtex/27ca892254c8e863256291ecf21aa1ba8/mensxmachina}, doi = {https://doi.org/10.1016/j.micromeso.2020.110160}, interhash = {38928b1e64d735cbd81ebdb04cf9f6a0}, intrahash = {7ca892254c8e863256291ecf21aa1ba8}, journal = {Microporous and Mesoporos Materials }, keywords = {mxmcausalpath}, timestamp = {2021-03-08T12:13:54.000+0100}, title = {An Automated Machine Learning architecture for the accelerated prediction of Metal-Organic Frameworks performance in energy and environmental applications}, url = {https://www.sciencedirect.com/science/article/abs/pii/S1387181120301633}, volume = 300, year = 2020 }
- K. Verrou, I. Tsamardinos, and G. Papoutsoglou, "Learning Pathway Dynamics from Single‐Cell Proteomic Data: A Comparative Study," Cytometry part A, Special Issue: Machine Learning for Single Cell Data,, vol. 97, iss. 3, 2020. doi:https://doi.org/10.1002/cyto.a.23976
[BibTeX] [Abstract] [Download PDF]
Single‐cell platforms provide statistically large samples of snapshot observations capable of resolving intrercellular heterogeneity. Currently, there is a growing literature on algorithms that exploit this attribute in order to infer the trajectory of biological mechanisms, such as cell proliferation and differentiation. Despite the efforts, the trajectory inference methodology has not yet been used for addressing the challenging problem of learning the dynamics of protein signaling systems. In this work, we assess this prospect by testing the performance of this class of algorithms on four proteomic temporal datasets. To evaluate the learning quality, we design new general‐purpose evaluation metrics that are able to quantify performance on (i) the biological meaning of the output, (ii) the consistency of the inferred trajectory, (iii) the algorithm robustness, (iv) the correlation of the learning output with the initial dataset, and (v) the roughness of the cell parameter levels though the inferred trajectory. We show that experimental time alone is insufficient to provide knowledge about the order of proteins during signal transduction. Accordingly, we show that the inferred trajectories provide richer information about the underlying dynamics. We learn that established methods tested on high‐dimensional data with small sample size, slow dynamics, and complex structures (e.g. bifurcations) cannot always work in the signaling setting. Among the methods we evaluate, Scorpius and a newly introduced approach that combines Diffusion Maps and Principal Curves were found to perform adequately in recovering the progression of signal transduction although their performance on some metrics varies from one dataset to another. The novel metrics we devise highlight that it is difficult to conclude, which one method is universally applicable for the task. Arguably, there are still many challenges and open problems to resolve. © 2020 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry.
@article{noauthororeditor, abstract = {Single‐cell platforms provide statistically large samples of snapshot observations capable of resolving intrercellular heterogeneity. Currently, there is a growing literature on algorithms that exploit this attribute in order to infer the trajectory of biological mechanisms, such as cell proliferation and differentiation. Despite the efforts, the trajectory inference methodology has not yet been used for addressing the challenging problem of learning the dynamics of protein signaling systems. In this work, we assess this prospect by testing the performance of this class of algorithms on four proteomic temporal datasets. To evaluate the learning quality, we design new general‐purpose evaluation metrics that are able to quantify performance on (i) the biological meaning of the output, (ii) the consistency of the inferred trajectory, (iii) the algorithm robustness, (iv) the correlation of the learning output with the initial dataset, and (v) the roughness of the cell parameter levels though the inferred trajectory. We show that experimental time alone is insufficient to provide knowledge about the order of proteins during signal transduction. Accordingly, we show that the inferred trajectories provide richer information about the underlying dynamics. We learn that established methods tested on high‐dimensional data with small sample size, slow dynamics, and complex structures (e.g. bifurcations) cannot always work in the signaling setting. Among the methods we evaluate, Scorpius and a newly introduced approach that combines Diffusion Maps and Principal Curves were found to perform adequately in recovering the progression of signal transduction although their performance on some metrics varies from one dataset to another. The novel metrics we devise highlight that it is difficult to conclude, which one method is universally applicable for the task. Arguably, there are still many challenges and open problems to resolve. © 2020 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry.}, added-at = {2020-04-15T09:49:24.000+0200}, author = {Verrou, Klio-Maria and Tsamardinos, Ioannis and Papoutsoglou, Georgios}, biburl = {https://www.bibsonomy.org/bibtex/2e3bfa8becd8b4b2537754b410e035264/mensxmachina}, doi = {https://doi.org/10.1002/cyto.a.23976}, interhash = {cfaefe697d477e338b1b5b57bc0e7335}, intrahash = {e3bfa8becd8b4b2537754b410e035264}, journal = {Cytometry part A, Special Issue: Machine Learning for Single Cell Data,}, keywords = {mxmcausalpath}, number = 3, timestamp = {2021-03-08T12:03:01.000+0100}, title = {Learning Pathway Dynamics from Single‐Cell Proteomic Data: A Comparative Study}, url = {https://onlinelibrary.wiley.com/doi/full/10.1002/cyto.a.23976}, volume = 97, year = 2020 }
- A. Agrapetidou, P. Charonyktakis, P. Gogas, T. Papadimitriou, and I. Tsamardinos, "An AutoML application to forecasting bank failures," Applied Economics Letters , 2020. doi:https://doi.org/10.1080/13504851.2020.1725230
[BibTeX] [Abstract] [Download PDF]
We investigate the performance of an automated machine learning (AutoML) methodology in forecasting bank failures, called Just Add Data (JAD). We include all failed U.S. banks for 2007–2013 and twice as many healthy ones. An automated feature selection procedure in JAD identifies the most significant forecasters and a bootstrapping methodology provides conservative estimates of performance generalization and confidence intervals. The best performing model yields an AUC 0.985. The current work provides evidence that JAD, and AutoML tools in general, could increase the productivity of financial data analysts, shield against methodological statistical errors, and provide models at par with state-of-the-art manual analysis.
@article{noauthororeditor, abstract = {We investigate the performance of an automated machine learning (AutoML) methodology in forecasting bank failures, called Just Add Data (JAD). We include all failed U.S. banks for 2007–2013 and twice as many healthy ones. An automated feature selection procedure in JAD identifies the most significant forecasters and a bootstrapping methodology provides conservative estimates of performance generalization and confidence intervals. The best performing model yields an AUC 0.985. The current work provides evidence that JAD, and AutoML tools in general, could increase the productivity of financial data analysts, shield against methodological statistical errors, and provide models at par with state-of-the-art manual analysis.}, added-at = {2020-04-15T09:44:58.000+0200}, author = {Agrapetidou, Anna and Charonyktakis, Paulos and Gogas, Periklis and Papadimitriou, Theofilos and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/2485506a39ec5bcb0d027c9ceaaffd99f/mensxmachina}, doi = {https://doi.org/10.1080/13504851.2020.1725230}, interhash = {681df752dae5354ddadb9b582760dccc}, intrahash = {485506a39ec5bcb0d027c9ceaaffd99f}, journal = {Applied Economics Letters }, keywords = {automl}, timestamp = {2020-04-15T09:44:58.000+0200}, title = {An AutoML application to forecasting bank failures}, url = {https://www.tandfonline.com/doi/citedby/10.1080/13504851.2020.1725230?scroll=top&needAccess=true}, year = 2020 }
- I. Xanthopoulos, I. Tsamardinos, V. Christophides, E. Simon, and A. Salinger, "Putting the Human Back in the AutoML Loop.," in EDBT/ICDT Workshops, 2020.
[BibTeX] [Abstract] [Download PDF]
Automated Machine Learning (AutoML) is a rapidly rising sub-field of Machine Learning. AutoML aims to fully automate the machine learning process end-to-end, democratizing Machine Learning to non-experts and drastically increasing the productivity of expert analysts. So far, most comparisons of AutoML systems focus on quantitative criteria such as predictive performance and execution time. In this paper, we examine AutoML services for predictive modeling tasks from a user's perspective, going beyond predictive performance. We present a wide palette of criteria and dimensions on which to evaluate and compare these services as a user. This qualitative comparative methodology is applied on seven AutoML systems, namely Auger.AI, BigML, H2O's Driverless AI, Darwin, Just Add Data Bio, Rapid-Miner, and Watson. The comparison indicates the strengths and weaknesses of each service, the needs that it covers, the segment of users that is most appropriate for, and the possibilities for improvements.
@inproceedings{conf/edbt/XanthopoulosTCS20, abstract = {Automated Machine Learning (AutoML) is a rapidly rising sub-field of Machine Learning. AutoML aims to fully automate the machine learning process end-to-end, democratizing Machine Learning to non-experts and drastically increasing the productivity of expert analysts. So far, most comparisons of AutoML systems focus on quantitative criteria such as predictive performance and execution time. In this paper, we examine AutoML services for predictive modeling tasks from a user's perspective, going beyond predictive performance. We present a wide palette of criteria and dimensions on which to evaluate and compare these services as a user. This qualitative comparative methodology is applied on seven AutoML systems, namely Auger.AI, BigML, H2O's Driverless AI, Darwin, Just Add Data Bio, Rapid-Miner, and Watson. The comparison indicates the strengths and weaknesses of each service, the needs that it covers, the segment of users that is most appropriate for, and the possibilities for improvements.}, added-at = {2020-04-10T12:29:09.000+0200}, author = {Xanthopoulos, Iordanis and Tsamardinos, Ioannis and Christophides, Vassilis and Simon, Eric and Salinger, Alejandro}, biburl = {https://www.bibsonomy.org/bibtex/24a1699e69e6518a1e50bc2ebea5da825/mensxmachina}, booktitle = {EDBT/ICDT Workshops}, crossref = {conf/edbt/2020w}, editor = {Poulovassilis, Alexandra and Auber, David and Bikakis, Nikos and Chrysanthis, Panos K. and Papastefanatos, George and Sharaf, Mohamed and Pelekis, Nikos and Renso, Chiara and Theodoridis, Yannis and Zeitouni, Karine and Cerquitelli, Tania and Chiusano, Silvia and Vargas-Solar, Genoveva and Omidvar-Tehrani, Behrooz and Morik, Katharina and Renders, Jean-Michel and Firmani, Donatella and Tanca, Letizia and Mottin, Davide and Lissandrini, Matteo and Velegrakis, Yannis}, ee = {http://ceur-ws.org/Vol-2578/ETMLP5.pdf}, interhash = {80875b3bcb7ce5f7b780107cb86039fa}, intrahash = {4a1699e69e6518a1e50bc2ebea5da825}, keywords = {mxmcausalpath}, publisher = {CEUR-WS.org}, series = {CEUR Workshop Proceedings}, timestamp = {2021-03-08T12:15:34.000+0100}, title = {Putting the Human Back in the AutoML Loop.}, url = {http://ceur-ws.org/Vol-2578/ETMLP5.pdf}, volume = 2578, year = 2020 }
- N. Malliaraki, K. Lakiotaki, R. Vamvoukaki, G. Notas, I. Tsamardinos, M. Kampa, and E. Castanas, "Translating vitamin D transcriptomics to clinical evidence: Analysis of data in asthma and chronic obstructive pulmonary disease, followed by clinical data meta-analysis," The Journal of Steroid Biochemistry and Molecular Biology, vol. 197, pp. 1-14, 2020. doi:https://doi.org/10.1016/j.jsbmb.2019.105505
[BibTeX] [Abstract] [Download PDF]
Vitamin D (VitD) continues to trigger intense scientific controversy, regarding both its bi ological targets and its supplementation doses and regimens. In an effort to resolve this dispute, we mapped VitD transcriptome-wide events in humans, in order to unveil shared patterns or mechanisms with diverse pathologies/tissue profiles and reveal causal effects between VitD actions and specific human diseases, using a recently developed bioinformatics methodology. Using the similarities in analyzed transcriptome data (c-SKL method), we validated our methodology with osteoporosis as an example and further analyzed two other strong hits, specifically chronic obstructive pulmonary disease (COPD) and asthma. The latter revealed no impact of VitD on known molecular pathways. In accordance to this finding, review and meta-analysis of published data, based on an objective measure (Forced Expiratory Volume at one second, FEV1%) did not further reveal any significant effect of VitD on the objective amelioration of either condition. This study may, therefore, be regarded as the first one to explore, in an objective, unbiased and unsupervised manner, the impact of VitD levels and/or interventions in a number of human pathologies.
@article{noauthororeditor, abstract = {Vitamin D (VitD) continues to trigger intense scientific controversy, regarding both its bi ological targets and its supplementation doses and regimens. In an effort to resolve this dispute, we mapped VitD transcriptome-wide events in humans, in order to unveil shared patterns or mechanisms with diverse pathologies/tissue profiles and reveal causal effects between VitD actions and specific human diseases, using a recently developed bioinformatics methodology. Using the similarities in analyzed transcriptome data (c-SKL method), we validated our methodology with osteoporosis as an example and further analyzed two other strong hits, specifically chronic obstructive pulmonary disease (COPD) and asthma. The latter revealed no impact of VitD on known molecular pathways. In accordance to this finding, review and meta-analysis of published data, based on an objective measure (Forced Expiratory Volume at one second, FEV1%) did not further reveal any significant effect of VitD on the objective amelioration of either condition. This study may, therefore, be regarded as the first one to explore, in an objective, unbiased and unsupervised manner, the impact of VitD levels and/or interventions in a number of human pathologies.}, added-at = {2019-12-20T11:38:36.000+0100}, author = {Malliaraki, Niki and Lakiotaki, Kleanthi and Vamvoukaki, Rodanthi and Notas, George and Tsamardinos, Ioannis and Kampa, Marilena and Castanas, Elias}, biburl = {https://www.bibsonomy.org/bibtex/28cfb82678ba09a59927a27a148d1959f/mensxmachina}, doi = {https://doi.org/10.1016/j.jsbmb.2019.105505}, interhash = {5288b5fd5d047654c87ff505dfaa1814}, intrahash = {8cfb82678ba09a59927a27a148d1959f}, journal = {The Journal of Steroid Biochemistry and Molecular Biology}, keywords = {mxmcausalpath}, pages = {1-14}, timestamp = {2021-03-18T08:44:15.000+0100}, title = {Translating vitamin D transcriptomics to clinical evidence: Analysis of data in asthma and chronic obstructive pulmonary disease, followed by clinical data meta-analysis}, url = {https://reader.elsevier.com/reader/sd/pii/S096007601930398X?token=BDFDFB0A2D6C3BCB2D6140BFCADFC9742EF3D905A0F5CFB518B320F4235CDDC6C6CF2A14B2FB25CB266333CBB3E631ED}, volume = 197, year = 2020 }
2019
- O. D. Røe, M. Markaki, I. Tsamardinos, V. Lagani, O. T. D. Nguyen, J. H. Pedersen, Z. Saghir, and H. G. Ashraf, "‘Reduced’ HUNT model outperforms NLST and NELSON study criteria in predicting lung cancer in the Danish screening trial ," BMJ Open Respiratory Research , vol. 6, iss. 1, 2019. doi:dx.doi.org/10.1136/bmjresp-2019-000512
[BibTeX] [Abstract] [Download PDF]
Hypothesis We hypothesise that the validated HUNT Lung Cancer Risk Model would perform better than the NLST (USA) and the NELSON (Dutch‐Belgian) criteria in the Danish Lung Cancer Screening Trial (DLCST). Methods The DLCST measured only five out of the seven variables included in validated HUNT Lung Cancer Model. Therefore a ‘Reduced’ model was retrained in the Norwegian HUNT2-cohort using the same statistical methodology as in the original HUNT model but based only on age, pack years, smoking intensity, quit time and body mass index (BMI), adjusted for sex. The model was applied on the DLCST-cohort and contrasted against the NLST and NELSON criteria. Results Among the 4051 smokers in the DLCST with 10 years follow-up, median age was 57.6, BMI 24.75, pack years 33.8, cigarettes per day 20 and most were current smokers. For the same number of individuals selected for screening, the performance of the ‘Reduced’ HUNT was increased in all metrics compared with both the NLST and the NELSON criteria. In addition, to achieve the same sensitivity, one would need to screen fewer people by the ‘Reduced’ HUNT model versus using either the NLST or the NELSON criteria (709 vs 918, p=1.02e-11 and 1317 vs 1668, p=2.2e-16, respectively). Conclusions The ‘Reduced’ HUNT model is superior in predicting lung cancer to both the NLST and NELSON criteria in a cost-effective way. This study supports the use of the HUNT Lung Cancer Model for selection based on risk ranking rather than age, pack year and quit time cut-off values. When we know how to rank personal risk, it will be up to the medical community and lawmakers to decide which risk threshold will be set for screening.
@article{noauthororeditor2019reduced, abstract = {Hypothesis We hypothesise that the validated HUNT Lung Cancer Risk Model would perform better than the NLST (USA) and the NELSON (Dutch‐Belgian) criteria in the Danish Lung Cancer Screening Trial (DLCST). Methods The DLCST measured only five out of the seven variables included in validated HUNT Lung Cancer Model. Therefore a ‘Reduced’ model was retrained in the Norwegian HUNT2-cohort using the same statistical methodology as in the original HUNT model but based only on age, pack years, smoking intensity, quit time and body mass index (BMI), adjusted for sex. The model was applied on the DLCST-cohort and contrasted against the NLST and NELSON criteria. Results Among the 4051 smokers in the DLCST with 10 years follow-up, median age was 57.6, BMI 24.75, pack years 33.8, cigarettes per day 20 and most were current smokers. For the same number of individuals selected for screening, the performance of the ‘Reduced’ HUNT was increased in all metrics compared with both the NLST and the NELSON criteria. In addition, to achieve the same sensitivity, one would need to screen fewer people by the ‘Reduced’ HUNT model versus using either the NLST or the NELSON criteria (709 vs 918, p=1.02e-11 and 1317 vs 1668, p=2.2e-16, respectively). Conclusions The ‘Reduced’ HUNT model is superior in predicting lung cancer to both the NLST and NELSON criteria in a cost-effective way. This study supports the use of the HUNT Lung Cancer Model for selection based on risk ranking rather than age, pack year and quit time cut-off values. When we know how to rank personal risk, it will be up to the medical community and lawmakers to decide which risk threshold will be set for screening.}, added-at = {2019-11-13T10:18:46.000+0100}, author = {Røe, Oluf Dimitri and Markaki, Maria and Tsamardinos, Ioannis and Lagani, Vincenzo and Nguyen, Olav Toai Duc and Pedersen, Jesper Holst and Saghir, Zaigham and Ashraf, Haseem Gary}, biburl = {https://www.bibsonomy.org/bibtex/2b526991a742c19df51bba5671c8e2015/mensxmachina}, doi = {dx.doi.org/10.1136/bmjresp-2019-000512}, interhash = {d35f4774e9052e723730411fd1234172}, intrahash = {b526991a742c19df51bba5671c8e2015}, journal = {BMJ Open Respiratory Research }, keywords = {cancer lung}, number = 1, timestamp = {2019-11-13T10:27:39.000+0100}, title = {‘Reduced’ HUNT model outperforms NLST and NELSON study criteria in predicting lung cancer in the Danish screening trial }, url = {https://bmjopenrespres.bmj.com/content/bmjresp/6/1/e000512.full.pdf}, volume = 6, year = 2019 }
- G. Papoutsoglou, V. Lagani, A. Schmidt, K. Tsirlis, D. Cabrero, J. Tegner, and I. Tsamardinos, "Challenges in the Multivariate Analysis of Mass Cytometry Data: The Effect of Randomization," Cytometry Part A, 2019. doi:https://doi.org/10.1002/cyto.a.23908
[BibTeX] [Abstract] [Download PDF]
Cytometry by time‐of‐flight (CyTOF) has emerged as a high‐throughput single cell technology able to provide large samples of protein readouts. Already, there exists a large pool of advanced high‐dimensional analysis algorithms that explore the observed heterogeneous distributions making intriguing biological inferences. A fact largely overlooked by these methods, however, is the effect of the established data preprocessing pipeline to the distributions of the measured quantities. In this article, we focus on randomization, a transformation used for improving data visualization, which can negatively affect multivariate data analysis methods such as dimensionality reduction, clustering, and network reconstruction algorithms. Our results indicate that randomization should be used only for visualization purposes, but not in conjunction with high‐dimensional analytical tools. © 2019 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry.
@article{papoutsoglou2019challenges, abstract = {Cytometry by time‐of‐flight (CyTOF) has emerged as a high‐throughput single cell technology able to provide large samples of protein readouts. Already, there exists a large pool of advanced high‐dimensional analysis algorithms that explore the observed heterogeneous distributions making intriguing biological inferences. A fact largely overlooked by these methods, however, is the effect of the established data preprocessing pipeline to the distributions of the measured quantities. In this article, we focus on randomization, a transformation used for improving data visualization, which can negatively affect multivariate data analysis methods such as dimensionality reduction, clustering, and network reconstruction algorithms. Our results indicate that randomization should be used only for visualization purposes, but not in conjunction with high‐dimensional analytical tools. © 2019 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry.}, added-at = {2019-11-06T12:21:03.000+0100}, author = {Papoutsoglou, Georgios and Lagani, Vincenzo and Schmidt, Angelika and Tsirlis, Konstantinos and Cabrero, David-Gomez and Tegner, Jesper and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/2ec8d495b1d35604f30b6fccbbb888292/mensxmachina}, doi = {https://doi.org/10.1002/cyto.a.23908}, interhash = {59ca510bf7d57dfbb97a59313975672e}, intrahash = {ec8d495b1d35604f30b6fccbbb888292}, journal = {Cytometry Part A}, keywords = {mxmcausalpath}, timestamp = {2021-03-08T12:18:30.000+0100}, title = {Challenges in the Multivariate Analysis of Mass Cytometry Data: The Effect of Randomization}, url = {https://onlinelibrary.wiley.com/doi/full/10.1002/cyto.a.23908}, year = 2019 }
- D. Gomez-Cabrero, S. Tarazona, I. Ferreirós-Vidal, R. N. Ramirez, C. Company, A. Schmidt, T. Reijmers, V. von Saint Paul, F. Marabita, J. Rodr'iguez-Ubreva, A. Garcia-Gomez, T. Carroll, L. Cooper, Z. Liang, G. Dharmalingam, F. van der Kloet, A. C. Harms, L. Balzano-Nogueira, V. Lagani, I. Tsamardinos, M. Lappe, D. Maier, J. A. Westerhuis, T. Hankemeier, A. Imhof, E. Ballestar, A. Mortazavi, M. Merkenschlager, J. Tegner, and A. Conesa, "STATegra, a comprehensive multi-omics dataset of B-cell differentiation in mouse," Scientific Data, vol. 6, iss. 1, 2019. doi:10.1038/s41597-019-0202-7
[BibTeX] [Abstract] [Download PDF]
Multi-omics approaches use a diversity of high-throughput technologies to profile the different molecular layers of living cells. Ideally, the integration of this information should result in comprehensive systems models of cellular physiology and regulation. However, most multi-omics projects still include a limited number of molecular assays and there have been very few multi-omic studies that evaluate dynamic processes such as cellular growth, development and adaptation. Hence, we lack formal analysis methods and comprehensive multi-omics datasets that can be leveraged to develop true multi-layered models for dynamic cellular systems. Here we present the STATegra multi-omics dataset that combines measurements from up to 10 different omics technologies applied to the same biological system, namely the well-studied mouse pre-B-cell differentiation. STATegra includes high-throughput measurements of chromatin structure, gene expression, proteomics and metabolomics, and it is complemented with single-cell data. To our knowledge, the STATegra collection is the most diverse multi-omics dataset describing a dynamic biological system.
@article{Gomez_Cabrero_2019, abstract = {Multi-omics approaches use a diversity of high-throughput technologies to profile the different molecular layers of living cells. Ideally, the integration of this information should result in comprehensive systems models of cellular physiology and regulation. However, most multi-omics projects still include a limited number of molecular assays and there have been very few multi-omic studies that evaluate dynamic processes such as cellular growth, development and adaptation. Hence, we lack formal analysis methods and comprehensive multi-omics datasets that can be leveraged to develop true multi-layered models for dynamic cellular systems. Here we present the STATegra multi-omics dataset that combines measurements from up to 10 different omics technologies applied to the same biological system, namely the well-studied mouse pre-B-cell differentiation. STATegra includes high-throughput measurements of chromatin structure, gene expression, proteomics and metabolomics, and it is complemented with single-cell data. To our knowledge, the STATegra collection is the most diverse multi-omics dataset describing a dynamic biological system.}, added-at = {2019-11-04T10:47:39.000+0100}, author = {Gomez-Cabrero, David and Tarazona, Sonia and Ferreir{\'{o}}s-Vidal, Isabel and Ramirez, Ricardo N. and Company, Carlos and Schmidt, Andreas and Reijmers, Theo and von Saint Paul, Veronica and Marabita, Francesco and Rodr{\'{\i}}guez-Ubreva, Javier and Garcia-Gomez, Antonio and Carroll, Thomas and Cooper, Lee and Liang, Ziwei and Dharmalingam, Gopuraja and van der Kloet, Frans and Harms, Amy C. and Balzano-Nogueira, Leandro and Lagani, Vincenzo and Tsamardinos, Ioannis and Lappe, Michael and Maier, Dieter and Westerhuis, Johan A. and Hankemeier, Thomas and Imhof, Axel and Ballestar, Esteban and Mortazavi, Ali and Merkenschlager, Matthias and Tegner, Jesper and Conesa, Ana}, biburl = {https://www.bibsonomy.org/bibtex/26352a4655343755af6e0855281f6943f/mensxmachina}, doi = {10.1038/s41597-019-0202-7}, interhash = {1b1997f9cf4a39e4886fd0fb6384ac41}, intrahash = {6352a4655343755af6e0855281f6943f}, journal = {Scientific Data}, keywords = {mmm}, month = oct, number = 1, publisher = {Springer Science and Business Media {LLC}}, timestamp = {2019-11-04T10:47:39.000+0100}, title = {{STATegra}, a comprehensive multi-omics dataset of B-cell differentiation in mouse}, url = {https://doi.org/10.1038%2Fs41597-019-0202-7}, volume = 6, year = 2019 }
- K. Lakiotaki, G. Georgakopoulos, E. Castanas, O. D. Røe, G. Borboudakis, and I. Tsamardinos, "A data driven approach reveals disease similarity on a molecular level," npj Systems Biology and Applications , vol. 5, iss. 39, pp. 1-10, 2019. doi:10.1038/s41540-019-0117-0
[BibTeX] [Abstract] [Download PDF]
Could there be unexpected similarities between different studies, diseases, or treatments, on a molecular level due to common biological mechanisms involved? To answer this question, we develop a method for computing similarities between empirical, statistical distributions of high-dimensional, low-sample datasets, and apply it on hundreds of -omics studies. The similarities lead to dataset-to-dataset networks visualizing the landscape of a large portion of biological data. Potentially interesting similarities connecting studies of different diseases are assembled in a disease-to-disease network. Exploring it, we discover numerous non-trivial connections between Alzheimer’s disease and schizophrenia, asthma and psoriasis, or liver cancer and obesity, to name a few. We then present a method that identifies the molecular quantities and pathways that contribute the most to the identified similarities and could point to novel drug targets or provide biological insights. The proposed method acts as a “statistical telescope” providing a global view of the constellation of biological data; readers can peek through it at: http://datascope.csd.uoc.gr:25000/.
@article{noauthororeditor, abstract = {Could there be unexpected similarities between different studies, diseases, or treatments, on a molecular level due to common biological mechanisms involved? To answer this question, we develop a method for computing similarities between empirical, statistical distributions of high-dimensional, low-sample datasets, and apply it on hundreds of -omics studies. The similarities lead to dataset-to-dataset networks visualizing the landscape of a large portion of biological data. Potentially interesting similarities connecting studies of different diseases are assembled in a disease-to-disease network. Exploring it, we discover numerous non-trivial connections between Alzheimer’s disease and schizophrenia, asthma and psoriasis, or liver cancer and obesity, to name a few. We then present a method that identifies the molecular quantities and pathways that contribute the most to the identified similarities and could point to novel drug targets or provide biological insights. The proposed method acts as a “statistical telescope” providing a global view of the constellation of biological data; readers can peek through it at: http://datascope.csd.uoc.gr:25000/.}, added-at = {2019-10-29T11:29:09.000+0100}, author = {Lakiotaki, Kleanthi and Georgakopoulos, George and Castanas, Elias and Røe, Oluf Dimitri and Borboudakis, Giorgos and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/23c018b31e2b3f946bde052b414e4ea82/mensxmachina}, doi = {10.1038/s41540-019-0117-0}, interhash = {e48ead7f0f6f503fe7647117214a3059}, intrahash = {3c018b31e2b3f946bde052b414e4ea82}, journal = {npj Systems Biology and Applications }, keywords = {mxmcausalpath}, month = oct, number = 39, pages = {1-10}, timestamp = {2021-03-08T12:20:05.000+0100}, title = {A data driven approach reveals disease similarity on a molecular level}, url = {https://www.nature.com/articles/s41540-019-0117-0}, volume = 5, year = 2019 }
- M. Tsagris and I. Tsamardinos, "Feature selection with the R package MXM," F1000Research, vol. 7, p. 1505, 2019. doi:https://doi.org/10.12688/f1000research.16216.2
[BibTeX] [Abstract]
Feature (or variable) selection is the process of identifying the minimal set of features with the highest predictive performance on the target variable of interest. Numerous feature selection algorithms have been developed over the years, but only few have been implemented in R and made publicly available R as packages while offering few options. The R package MXM offers a variety of feature selection algorithms, and has unique features that make it advantageous over its competitors: a) it contains feature selection algorithms that can treat numerous types of target variables, including continuous, percentages, time to event (survival), binary, nominal, ordinal, clustered, counts, left censored, etc; b) it contains a variety of regression models that can be plugged into the feature selection algorithms (for example with time to event data the user can choose among Cox, Weibull, log logistic or exponential regression); c) it includes an algorithm for detecting multiple solutions (many sets of statistically equivalent features, plain speaking, two features can carry statistically equivalent information when substituting one with the other does not effect the inference or the conclusions); and d) it includes memory efficient algorithms for high volume data, data that cannot be loaded into R (In a 16GB RAM terminal for example, R cannot directly load data of 16GB size. By utilizing the proper package, we load the data and then perform feature selection.). In this paper, we qualitatively compare MXM with other relevant feature selection packages and discuss its advantages and disadvantages. Further, we provide a demonstration of MXM’s algorithms using real high-dimensional data from various applications. Keywords
@article{noauthororeditor, abstract = {Feature (or variable) selection is the process of identifying the minimal set of features with the highest predictive performance on the target variable of interest. Numerous feature selection algorithms have been developed over the years, but only few have been implemented in R and made publicly available R as packages while offering few options. The R package MXM offers a variety of feature selection algorithms, and has unique features that make it advantageous over its competitors: a) it contains feature selection algorithms that can treat numerous types of target variables, including continuous, percentages, time to event (survival), binary, nominal, ordinal, clustered, counts, left censored, etc; b) it contains a variety of regression models that can be plugged into the feature selection algorithms (for example with time to event data the user can choose among Cox, Weibull, log logistic or exponential regression); c) it includes an algorithm for detecting multiple solutions (many sets of statistically equivalent features, plain speaking, two features can carry statistically equivalent information when substituting one with the other does not effect the inference or the conclusions); and d) it includes memory efficient algorithms for high volume data, data that cannot be loaded into R (In a 16GB RAM terminal for example, R cannot directly load data of 16GB size. By utilizing the proper package, we load the data and then perform feature selection.). In this paper, we qualitatively compare MXM with other relevant feature selection packages and discuss its advantages and disadvantages. Further, we provide a demonstration of MXM’s algorithms using real high-dimensional data from various applications. Keywords}, added-at = {2019-10-15T10:34:55.000+0200}, author = {Tsagris, M and Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/2e781d8b1b5f054e4b44da2aa2439fa94/mensxmachina}, doi = {https://doi.org/10.12688/f1000research.16216.2}, interhash = {eecad4526b5bcd1e6f1ea321c636118f}, intrahash = {e781d8b1b5f054e4b44da2aa2439fa94}, journal = {F1000Research}, keywords = {mxmcausalpath}, pages = 1505, timestamp = {2021-03-08T12:27:47.000+0100}, title = {Feature selection with the R package MXM}, volume = 7, year = 2019 }
- D. Kyriakis, A. Kanterakis, T. Manousaki, A. Tsakogiannis, M. Tsagris, I. Tsamardinos, L. Papaharisis, D. Chatziplis, G. Potamias, and C. Tsigenopoulos, "Scanning of Genetic Variants and Genetic Mapping of Phenotypic Traits in Gilthead Sea Bream Through ddRAD Sequencing," Frontiers in Genetics , vol. 10, p. 675, 2019. doi:10.3389/fgene.2019.00675
[BibTeX] [Abstract]
Gilthead sea bream (Sparus aurata) is a teleost of considerable economic importance in Southern European aquaculture. The aquaculture industry shows a growing interest in the application of genetic methods that can locate phenotype–genotype associations with high economic impact. Through selective breeding, the aquaculture industry can exploit this information to maximize the financial yield. Here, we present a Genome Wide Association Study (GWAS) of 112 samples belonging to seven different sea bream families collected from a Greek commercial aquaculture company. Through double digest Random Amplified DNA (ddRAD) Sequencing, we generated a per-sample genetic profile consisting of 2,258 high-quality Single Nucleotide Polymorphisms (SNPs). These profiles were tested for association with four phenotypes of major financial importance: Fat, Weight, Tag Weight, and the Length to Width ratio. We applied two methods of association analysis. The first is the typical single-SNP to phenotype test, and the second is a feature selection (FS) method through two novel algorithms that are employed for the first time in aquaculture genomics and produce groups with multiple SNPs associated to a phenotype. In total, we identified 9 single SNPs and 6 groups of SNPs associated with weight-related phenotypes (Weight and Tag Weight), 2 groups associated with Fat, and 16 groups associated with the Length to Width ratio. Six identified loci (Chr4:23265532, Chr6:12617755, Chr:8:11613979, Chr13:1098152, Chr15:3260819, and Chr22:14483563) were present in genes associated with growth in other teleosts or even mammals, such as semaphorin-3A and neurotrophin-3. These loci are strong candidates for future studies that will help us unveil the genetic mechanisms underlying growth and improve the sea bream aquaculture productivity by providing genomic anchors for selection programs.
@article{noauthororeditor, abstract = {Gilthead sea bream (Sparus aurata) is a teleost of considerable economic importance in Southern European aquaculture. The aquaculture industry shows a growing interest in the application of genetic methods that can locate phenotype–genotype associations with high economic impact. Through selective breeding, the aquaculture industry can exploit this information to maximize the financial yield. Here, we present a Genome Wide Association Study (GWAS) of 112 samples belonging to seven different sea bream families collected from a Greek commercial aquaculture company. Through double digest Random Amplified DNA (ddRAD) Sequencing, we generated a per-sample genetic profile consisting of 2,258 high-quality Single Nucleotide Polymorphisms (SNPs). These profiles were tested for association with four phenotypes of major financial importance: Fat, Weight, Tag Weight, and the Length to Width ratio. We applied two methods of association analysis. The first is the typical single-SNP to phenotype test, and the second is a feature selection (FS) method through two novel algorithms that are employed for the first time in aquaculture genomics and produce groups with multiple SNPs associated to a phenotype. In total, we identified 9 single SNPs and 6 groups of SNPs associated with weight-related phenotypes (Weight and Tag Weight), 2 groups associated with Fat, and 16 groups associated with the Length to Width ratio. Six identified loci (Chr4:23265532, Chr6:12617755, Chr:8:11613979, Chr13:1098152, Chr15:3260819, and Chr22:14483563) were present in genes associated with growth in other teleosts or even mammals, such as semaphorin-3A and neurotrophin-3. These loci are strong candidates for future studies that will help us unveil the genetic mechanisms underlying growth and improve the sea bream aquaculture productivity by providing genomic anchors for selection programs. }, added-at = {2019-10-15T10:30:00.000+0200}, author = {Kyriakis, Dimitrios and Kanterakis, Alexandros and Manousaki, Tereza and Tsakogiannis, Alexandros and Tsagris, Michalis and Tsamardinos, Ioannis and Papaharisis, Leonidas and Chatziplis, Dimitris and Potamias, George and Tsigenopoulos, Costas}, biburl = {https://www.bibsonomy.org/bibtex/2dfc6576873fefd21bfc0dd0d979249fc/mensxmachina}, doi = {10.3389/fgene.2019.00675}, interhash = {09fd2ec870d7a4ee7ebcaf4a3d934a96}, intrahash = {dfc6576873fefd21bfc0dd0d979249fc}, journal = {Frontiers in Genetics }, keywords = {dddd}, pages = 675, timestamp = {2019-10-29T11:37:42.000+0100}, title = {Scanning of Genetic Variants and Genetic Mapping of Phenotypic Traits in Gilthead Sea Bream Through ddRAD Sequencing}, volume = 10, year = 2019 }
- J. Fernandes Sunja, H. Morikawa, E. Ewing, S. Ruhrmann, N. Joshi Rubin, V. Lagani, N. Karathanasis, M. Khademi, N. Planell, A. Schmidt, I. Tsamardinos, T. Olsson, F. Piehl, I. Kockum, M. Jagodic, J. Tegnér, and D. Gomez-Cabrero, "Non-parametric combination analysis of multiple data types enables detection of novel regulatory mechanisms in T cells of multiple sclerosis patients," Nature Scientific Reports, vol. 9, iss. 11996, 2019. doi:10.1038/s41598-019-48493-7
[BibTeX] [Abstract] [Download PDF]
Multiple Sclerosis (MS) is an autoimmune disease of the central nervous system with prominent neurodegenerative components. the triggering and progression of MS is associated with transcriptional and epigenetic alterations in several tissues, including peripheral blood. The combined influence of transcriptional and epigenetic changes associated with MS has not been assessed in the same individuals. Here we generated paired transcriptomic (RNA-seq) and DNA methylation (Illumina 450 K array) profiles of CD4+ and CD8+ T cells (CD4, CD8), using clinically accessible blood from healthy donors and MS patients in the initial relapsing-remitting and subsequent secondary-progressive stage. By integrating the output of a differential expression test with a permutation-based non-parametric combination methodology, we identified 149 differentially expressed (DE) genes in both CD4 and CD8 cells collected from MS patients. Moreover, by leveraging the methylation-dependent regulation of gene expression, we identified the gene SH3YL1, which displayed significant correlated expression and methylation changes in MS patients. Importantly, silencing of SH3YL1 in primary human CD4 cells demonstrated its influence on T cell activation. Collectively, our strategy based on paired sampling of several cell-types provides a novel approach to increase sensitivity for identifying shared mechanisms altered in CD4 and CD8 cells of relevance in MS in small sized clinical materials.
@article{jude2019nonparametric, abstract = {Multiple Sclerosis (MS) is an autoimmune disease of the central nervous system with prominent neurodegenerative components. the triggering and progression of MS is associated with transcriptional and epigenetic alterations in several tissues, including peripheral blood. The combined influence of transcriptional and epigenetic changes associated with MS has not been assessed in the same individuals. Here we generated paired transcriptomic (RNA-seq) and DNA methylation (Illumina 450 K array) profiles of CD4+ and CD8+ T cells (CD4, CD8), using clinically accessible blood from healthy donors and MS patients in the initial relapsing-remitting and subsequent secondary-progressive stage. By integrating the output of a differential expression test with a permutation-based non-parametric combination methodology, we identified 149 differentially expressed (DE) genes in both CD4 and CD8 cells collected from MS patients. Moreover, by leveraging the methylation-dependent regulation of gene expression, we identified the gene SH3YL1, which displayed significant correlated expression and methylation changes in MS patients. Importantly, silencing of SH3YL1 in primary human CD4 cells demonstrated its influence on T cell activation. Collectively, our strategy based on paired sampling of several cell-types provides a novel approach to increase sensitivity for identifying shared mechanisms altered in CD4 and CD8 cells of relevance in MS in small sized clinical materials.}, added-at = {2019-09-26T12:00:57.000+0200}, author = {Fernandes Sunja, Jude and Morikawa, Hiromasa and Ewing, Ewoud and Ruhrmann, Sabrina and Joshi Rubin, Narayan and Lagani, Vincenzo and Karathanasis, Nestoras and Khademi, Mohsen and Planell, Nuria and Schmidt, Angelika and Tsamardinos, Ioannis and Olsson, Tomas and Piehl, Fredrik and Kockum, Ingrid and Jagodic, Maja and Tegnér, Jesper and Gomez-Cabrero, David}, biburl = {https://www.bibsonomy.org/bibtex/24e8d52cdff48c449b171f359ee3961d7/mensxmachina}, doi = {10.1038/s41598-019-48493-7}, interhash = {da7a8c5930f294e6881f967fca95fe53}, intrahash = {4e8d52cdff48c449b171f359ee3961d7}, journal = {Nature Scientific Reports}, keywords = {mxmcausalpath}, month = {August}, number = 11996, timestamp = {2021-03-10T08:58:55.000+0100}, title = {Non-parametric combination analysis of multiple data types enables detection of novel regulatory mechanisms in T cells of multiple sclerosis patients}, url = {https://www.nature.com/articles/s41598-019-48493-7}, volume = 9, year = 2019 }
- E. Ewing, L. Kular, S. J. Fernandes, N. Karathanasis, V. Lagani, S. Ruhrmann, I. Tsamardinos, J. Tegner, F. Piehl, D. Gomez-Cabrero, and M. Jagodic, "Combining evidence from four immune cell types identifies DNA methylation patterns that implicate functionally distinct pathways during Multiple Sclerosis progression," EBioMedicine, vol. 43, pp. 411-423, 2019. doi:10.1016/j.ebiom.2019.04.042
[BibTeX] [Abstract] [Download PDF]
Background Multiple Sclerosis (MS) is a chronic inflammatory disease and a leading cause of progressive neurological disability among young adults. DNA methylation, which intersects genes and environment to control cellular functions on a molecular level, may provide insights into MS pathogenesis. Methods We measured DNA methylation in CD4+ T cells (n = 31), CD8+ T cells (n = 28), CD14+ monocytes (n = 35) and CD19+ B cells (n = 27) from relapsing-remitting (RRMS), secondary progressive (SPMS) patients and healthy controls (HC) using Infinium HumanMethylation450 arrays. Monocyte (n = 25) and whole blood (n = 275) cohorts were used for validations. Findings B cells from MS patients displayed most significant differentially methylated positions (DMPs), followed by monocytes, while only few DMPs were detected in T cells. We implemented a non-parametric combination framework (omicsNPC) to increase discovery power by combining evidence from all four cell types. Identified shared DMPs co-localized at MS risk loci and clustered into distinct groups. Functional exploration of changes discriminating RRMS and SPMS from HC implicated lymphocyte signaling, T cell activation and migration. SPMS-specific changes, on the other hand, implicated myeloid cell functions and metabolism. Interestingly, neuronal and neurodegenerative genes and pathways were also specifically enriched in the SPMS cluster. Interpretation We utilized a statistical framework (omicsNPC) that combines multiple layers of evidence to identify DNA methylation changes that provide new insights into MS pathogenesis in general, and disease progression, in particular. Fund This work was supported by the Swedish Research Council, Stockholm County Council, AstraZeneca, European Research Council, Karolinska Institutet and Margaretha af Ugglas Foundation.
@article{Ewing_2019, abstract = {Background Multiple Sclerosis (MS) is a chronic inflammatory disease and a leading cause of progressive neurological disability among young adults. DNA methylation, which intersects genes and environment to control cellular functions on a molecular level, may provide insights into MS pathogenesis. Methods We measured DNA methylation in CD4+ T cells (n = 31), CD8+ T cells (n = 28), CD14+ monocytes (n = 35) and CD19+ B cells (n = 27) from relapsing-remitting (RRMS), secondary progressive (SPMS) patients and healthy controls (HC) using Infinium HumanMethylation450 arrays. Monocyte (n = 25) and whole blood (n = 275) cohorts were used for validations. Findings B cells from MS patients displayed most significant differentially methylated positions (DMPs), followed by monocytes, while only few DMPs were detected in T cells. We implemented a non-parametric combination framework (omicsNPC) to increase discovery power by combining evidence from all four cell types. Identified shared DMPs co-localized at MS risk loci and clustered into distinct groups. Functional exploration of changes discriminating RRMS and SPMS from HC implicated lymphocyte signaling, T cell activation and migration. SPMS-specific changes, on the other hand, implicated myeloid cell functions and metabolism. Interestingly, neuronal and neurodegenerative genes and pathways were also specifically enriched in the SPMS cluster. Interpretation We utilized a statistical framework (omicsNPC) that combines multiple layers of evidence to identify DNA methylation changes that provide new insights into MS pathogenesis in general, and disease progression, in particular. Fund This work was supported by the Swedish Research Council, Stockholm County Council, AstraZeneca, European Research Council, Karolinska Institutet and Margaretha af Ugglas Foundation.}, added-at = {2019-08-21T10:15:49.000+0200}, author = {Ewing, Ewoud and Kular, Lara and Fernandes, Sunjay J. and Karathanasis, Nestoras and Lagani, Vincenzo and Ruhrmann, Sabrina and Tsamardinos, Ioannis and Tegner, Jesper and Piehl, Fredrik and Gomez-Cabrero, David and Jagodic, Maja}, biburl = {https://www.bibsonomy.org/bibtex/2248e805997be2f24632112b442ef4e4b/mensxmachina}, doi = {10.1016/j.ebiom.2019.04.042}, interhash = {d968457cbd123773e47fe5018137eaa2}, intrahash = {248e805997be2f24632112b442ef4e4b}, journal = {{EBioMedicine}}, keywords = {mxmcausalpath}, month = may, pages = {411--423}, publisher = {Elsevier {BV}}, timestamp = {2021-03-10T09:00:03.000+0100}, title = {Combining evidence from four immune cell types identifies {DNA} methylation patterns that implicate functionally distinct pathways during Multiple Sclerosis progression}, url = {https://doi.org/10.1016%2Fj.ebiom.2019.04.042}, volume = 43, year = 2019 }
- M. S. Loos, R. Ramakrishnan, W. Vranken, A. Tsirigotaki, E. Tsare, V. Zorzini, J. D. Geyter, B. Yuan, I. Tsamardinos, M. Klappa, J. Schymkowitz, F. Rousseau, S. Karamanou, and A. Economou, "Structural Basis of the Subcellular Topology Landscape of Escherichia coli," Frontiers in Microbiology, vol. 10, 2019. doi:10.3389/fmicb.2019.01670
[BibTeX] [Abstract] [Download PDF]
Cellular proteomes are distributed in multiple compartments: on DNA, ribosomes, on and inside membranes, or they become secreted. Structural properties that allow polypeptides to occupy subcellular niches, particularly to after crossing membranes, remain unclear. We compared intrinsic and extrinsic features in cytoplasmic and secreted polypeptides of the Escherichia coli K-12 proteome. Structural features between the cytoplasmome and secretome are sharply distinct, such that a signal peptide-agnostic machine learning tool distinguishes cytoplasmic from secreted proteins with 95.5% success. Cytoplasmic polypeptides are enriched in aliphatic, aromatic, charged and hydrophobic residues, unique folds and higher early folding propensities. Secretory polypeptides are enriched in polar/small amino acids, β folds, have higher backbone dynamics, higher disorder and contact order and are more often intrinsically disordered. These non-random distributions and experimental evidence imply that evolutionary pressure selected enhanced secretome flexibility, slow folding and looser structures, placing the secretome in a distinct protein class. These adaptations protect the secretome from premature folding during its cytoplasmic transit, optimize its lipid bilayer crossing and allowed it to acquire cell envelope specific chemistries. The latter may favor promiscuous multi-ligand binding, sensing of stress and cell envelope structure changes. In conclusion, enhanced flexibility, slow folding, looser structures and unique folds differentiate the secretome from the cytoplasmome. These findings have wide implications on the structural diversity and evolution of modern proteomes and the protein folding problem.
@article{Loos_2019, abstract = {Cellular proteomes are distributed in multiple compartments: on DNA, ribosomes, on and inside membranes, or they become secreted. Structural properties that allow polypeptides to occupy subcellular niches, particularly to after crossing membranes, remain unclear. We compared intrinsic and extrinsic features in cytoplasmic and secreted polypeptides of the Escherichia coli K-12 proteome. Structural features between the cytoplasmome and secretome are sharply distinct, such that a signal peptide-agnostic machine learning tool distinguishes cytoplasmic from secreted proteins with 95.5% success. Cytoplasmic polypeptides are enriched in aliphatic, aromatic, charged and hydrophobic residues, unique folds and higher early folding propensities. Secretory polypeptides are enriched in polar/small amino acids, β folds, have higher backbone dynamics, higher disorder and contact order and are more often intrinsically disordered. These non-random distributions and experimental evidence imply that evolutionary pressure selected enhanced secretome flexibility, slow folding and looser structures, placing the secretome in a distinct protein class. These adaptations protect the secretome from premature folding during its cytoplasmic transit, optimize its lipid bilayer crossing and allowed it to acquire cell envelope specific chemistries. The latter may favor promiscuous multi-ligand binding, sensing of stress and cell envelope structure changes. In conclusion, enhanced flexibility, slow folding, looser structures and unique folds differentiate the secretome from the cytoplasmome. These findings have wide implications on the structural diversity and evolution of modern proteomes and the protein folding problem.}, added-at = {2019-08-21T10:13:49.000+0200}, author = {Loos, Maria S. and Ramakrishnan, Reshmi and Vranken, Wim and Tsirigotaki, Alexandra and Tsare, Evrydiki-Pandora and Zorzini, Valentina and Geyter, Jozefien De and Yuan, Biao and Tsamardinos, Ioannis and Klappa, Maria and Schymkowitz, Joost and Rousseau, Frederic and Karamanou, Spyridoula and Economou, Anastassios}, biburl = {https://www.bibsonomy.org/bibtex/252185c42a364f2d694e2ab73919ae419/mensxmachina}, doi = {10.3389/fmicb.2019.01670}, interhash = {01b19ff85dd7e5afe5d2443997f749a5}, intrahash = {52185c42a364f2d694e2ab73919ae419}, journal = {Frontiers in Microbiology}, keywords = {mxmcausalpath}, month = jul, publisher = {Frontiers Media {SA}}, timestamp = {2021-03-10T09:01:43.000+0100}, title = {Structural Basis of the Subcellular Topology Landscape of Escherichia coli}, url = {https://doi.org/10.3389%2Ffmicb.2019.01670}, volume = 10, year = 2019 }
- I. Ferreirós-Vidal, T. Carroll, T. Zhang, V. Lagani, R. N. Ramirez, E. Ing-Simmons, A. Garcia, L. Cooper, Z. Liang, G. Papoutsoglou, G. Dharmalingam, Y. Guo, S. Tarazona, S. J. Fernandes, P. Noori, G. Silberberg, A. G. Fisher, I. Tsamardinos, A. Mortazavi, B. Lenhard, A. Conesa, J. Tegner, M. Merkenschlager, and D. Gomez-Cabrero, "Feedforward regulation of Myc coordinates lineage-specific with housekeeping gene expression during B cell progenitor cell differentiation," PLOS Biology, vol. 17, iss. 4, pp. 1-28, 2019. doi:10.1371/journal.pbio.2006506
[BibTeX] [Abstract] [Download PDF]
The human body is made from billions of cells comprizing many specialized cell types. All of these cells ultimately come from a single fertilized oocyte in a process that has two key features: proliferation, which expands cell numbers, and differentiation, which diversifies cell types. Here, we have examined the transition from proliferation to differentiation using B lymphocytes as an example. We find that the transition from proliferation to differentiation involves changes in the expression of genes, which can be categorized into cell-type–specific genes and broadly expressed “housekeeping” genes. The expression of many housekeeping genes is controlled by the gene regulatory factor Myc, whereas the expression of many B lymphocyte–specific genes is controlled by the Ikaros family of gene regulatory proteins. Myc is repressed by Ikaros, which means that changes in housekeeping and tissue-specific gene expression are coordinated during the transition from proliferation to differentiation.
@article{10.1371/journal.pbio.2006506, abstract = {The human body is made from billions of cells comprizing many specialized cell types. All of these cells ultimately come from a single fertilized oocyte in a process that has two key features: proliferation, which expands cell numbers, and differentiation, which diversifies cell types. Here, we have examined the transition from proliferation to differentiation using B lymphocytes as an example. We find that the transition from proliferation to differentiation involves changes in the expression of genes, which can be categorized into cell-type–specific genes and broadly expressed “housekeeping” genes. The expression of many housekeeping genes is controlled by the gene regulatory factor Myc, whereas the expression of many B lymphocyte–specific genes is controlled by the Ikaros family of gene regulatory proteins. Myc is repressed by Ikaros, which means that changes in housekeeping and tissue-specific gene expression are coordinated during the transition from proliferation to differentiation.}, added-at = {2019-04-15T10:34:19.000+0200}, author = {Ferreirós-Vidal, Isabel and Carroll, Thomas and Zhang, Tianyi and Lagani, Vincenzo and Ramirez, Ricardo N. and Ing-Simmons, Elizabeth and Garcia, Alicia and Cooper, Lee and Liang, Ziwei and Papoutsoglou, Georgios and Dharmalingam, Gopuraja and Guo, Ya and Tarazona, Sonia and Fernandes, Sunjay J. and Noori, Peri and Silberberg, Gilad and Fisher, Amanda G. and Tsamardinos, Ioannis and Mortazavi, Ali and Lenhard, Boris and Conesa, Ana and Tegner, Jesper and Merkenschlager, Matthias and Gomez-Cabrero, David}, biburl = {https://www.bibsonomy.org/bibtex/2bd3e0f1a5421ea097c5f5c72221afddf/mensxmachina}, doi = {10.1371/journal.pbio.2006506}, interhash = {47806e111971adfc2e5769052393b71e}, intrahash = {bd3e0f1a5421ea097c5f5c72221afddf}, journal = {PLOS Biology}, keywords = {mxmcausalpath}, month = {04}, number = 4, pages = {1-28}, publisher = {Public Library of Science}, timestamp = {2021-03-10T09:19:14.000+0100}, title = {Feedforward regulation of Myc coordinates lineage-specific with housekeeping gene expression during B cell progenitor cell differentiation}, url = {https://doi.org/10.1371/journal.pbio.2006506}, volume = 17, year = 2019 }
- Y. Pantazis and I. Tsamardinos, "A unified approach for sparse dynamical system inference from temporal measurements," Bioinformatics, 2019. doi:10.1093/bioinformatics/btz065
[BibTeX] [Abstract] [Download PDF]
Temporal variations in biological systems and more generally in natural sciences are typically modeled as a set of ordinary, partial or stochastic differential or difference equations. Algorithms for learning the structure and the parameters of a dynamical system are distinguished based on whether time is discrete or continuous, observations are time-series or time-course and whether the system is deterministic or stochastic, however, there is no approach able to handle the various types of dynamical systems simultaneously.In this paper, we present a unified approach to infer both the structure and the parameters of non-linear dynamical systems of any type under the restriction of being linear with respect to the unknown parameters. Our approach, which is named Unified Sparse Dynamics Learning (USDL), constitutes of two steps. First, an atemporal system of equations is derived through the application of the weak formulation. Then, assuming a sparse representation for the dynamical system, we show that the inference problem can be expressed as a sparse signal recovery problem, allowing the application of an extensive body of algorithms and theoretical results. Results on simulated data demonstrate the efficacy and superiority of the USDL algorithm under multiple interventions and/or stochasticity. Additionally, USDL’s accuracy significantly correlates with theoretical metrics such as the exact recovery coefficient. On real single-cell data, the proposed approach is able to induce high-confidence subgraphs of the signaling pathway.Source code is available at Bioinformatics online. USDL algorithm has been also integrated in SCENERY (http://scenery.csd.uoc.gr/); an online tool for single-cell mass cytometry analytics.Supplementary data are available at Bioinformatics online.
@article{10.1093/bioinformatics/btz065, abstract = {Temporal variations in biological systems and more generally in natural sciences are typically modeled as a set of ordinary, partial or stochastic differential or difference equations. Algorithms for learning the structure and the parameters of a dynamical system are distinguished based on whether time is discrete or continuous, observations are time-series or time-course and whether the system is deterministic or stochastic, however, there is no approach able to handle the various types of dynamical systems simultaneously.In this paper, we present a unified approach to infer both the structure and the parameters of non-linear dynamical systems of any type under the restriction of being linear with respect to the unknown parameters. Our approach, which is named Unified Sparse Dynamics Learning (USDL), constitutes of two steps. First, an atemporal system of equations is derived through the application of the weak formulation. Then, assuming a sparse representation for the dynamical system, we show that the inference problem can be expressed as a sparse signal recovery problem, allowing the application of an extensive body of algorithms and theoretical results. Results on simulated data demonstrate the efficacy and superiority of the USDL algorithm under multiple interventions and/or stochasticity. Additionally, USDL’s accuracy significantly correlates with theoretical metrics such as the exact recovery coefficient. On real single-cell data, the proposed approach is able to induce high-confidence subgraphs of the signaling pathway.Source code is available at Bioinformatics online. USDL algorithm has been also integrated in SCENERY (http://scenery.csd.uoc.gr/); an online tool for single-cell mass cytometry analytics.Supplementary data are available at Bioinformatics online.}, added-at = {2019-03-06T13:27:40.000+0100}, author = {Pantazis, Yannis and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/2662a8f26ef15e0b79a88593ddc0574fd/mensxmachina}, doi = {10.1093/bioinformatics/btz065}, eprint = {http://oup.prod.sis.lan/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btz065/27980298/btz065.pdf}, interhash = {da2952f9b07f650ea261d6ba83859a43}, intrahash = {662a8f26ef15e0b79a88593ddc0574fd}, journal = {Bioinformatics}, keywords = {mxmcausalpath}, month = {01}, timestamp = {2021-03-10T09:20:39.000+0100}, title = {A unified approach for sparse dynamical system inference from temporal measurements}, url = {https://dx.doi.org/10.1093/bioinformatics/btz065}, year = 2019 }
- G. Borboudakis and I. Tsamardinos, "Forward-Backward Selection with Early Dropping," Journal of Machine Learning Research, vol. 20, iss. 8, pp. 1-39, 2019.
[BibTeX] [Abstract] [Download PDF]
Forward-backward selection is one of the most basic and commonly-used feature selection algorithms available. It is also general and conceptually applicable to many different types of data. In this paper, we propose a heuristic that significantly improves its running time, while preserving predictive performance. The idea is to temporarily discard the variables that are conditionally independent with the outcome given the selected variable set. Depending on how those variables are reconsidered and reintroduced, this heuristic gives rise to a family of algorithms with increasingly stronger theoretical guarantees. In distributions that can be faithfully represented by Bayesian networks or maximal ancestral graphs, members of this algorithmic family are able to correctly identify the Markov blanket in the sample limit. In experiments we show that the proposed heuristic increases computational efficiency by about 1-2 orders of magnitude, while selecting fewer or the same number of variables and retaining predictive performance. Furthermore, we show that the proposed algorithm and feature selection with LASSO perform similarly when restricted to select the same number of variables, making the proposed algorithm an attractive alternative for problems where no (efficient) algorithm for LASSO exists
@article{guyon2019forwardbackward, abstract = {Forward-backward selection is one of the most basic and commonly-used feature selection algorithms available. It is also general and conceptually applicable to many different types of data. In this paper, we propose a heuristic that significantly improves its running time, while preserving predictive performance. The idea is to temporarily discard the variables that are conditionally independent with the outcome given the selected variable set. Depending on how those variables are reconsidered and reintroduced, this heuristic gives rise to a family of algorithms with increasingly stronger theoretical guarantees. In distributions that can be faithfully represented by Bayesian networks or maximal ancestral graphs, members of this algorithmic family are able to correctly identify the Markov blanket in the sample limit. In experiments we show that the proposed heuristic increases computational efficiency by about 1-2 orders of magnitude, while selecting fewer or the same number of variables and retaining predictive performance. Furthermore, we show that the proposed algorithm and feature selection with LASSO perform similarly when restricted to select the same number of variables, making the proposed algorithm an attractive alternative for problems where no (efficient) algorithm for LASSO exists}, added-at = {2019-03-06T13:21:10.000+0100}, author = {Borboudakis, Giorgos and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/20379540925c4602d54d5acccd0268113/mensxmachina}, editor = {Guyon, Isabelle}, interhash = {af84c72c0490ecd3738da71550eaec18}, intrahash = {0379540925c4602d54d5acccd0268113}, journal = {Journal of Machine Learning Research}, keywords = {mxmcausalpath}, month = {January}, number = 8, pages = {1-39}, pdf = {http://jmlr.org/papers/volume20/17-334/17-334.pdf}, timestamp = {2021-03-10T09:21:04.000+0100}, title = {Forward-Backward Selection with Early Dropping}, url = {http://jmlr.org/papers/volume20/17-334/17-334.pdf}, volume = 20, year = 2019 }
- M. Panagopoulou, M. Karaglani, I. Balgkouranidou, V. Vasilakakis, E. Biziota, T. Koukaki, E. Karamitrousis, E. Nena, I. Tsamardinos, G. Kolios, E. Lianidou, S. Kakolyris, and E. Chatzaki, "Circulating cell free DNA in Breast cancer: size profiling, levels and methylation patterns lead to prognostic and predictive classifiers," Oncogene , vol. 38, iss. 18, pp. 3387-3401, 2019. doi:10.1038/s41388-018-0660-y
[BibTeX] [Abstract] [Download PDF]
Blood circulating cell-free DNA (ccfDNA) is a suggested biosource of valuable clinical information for cancer, meeting the need for a minimally-invasive advancement in the route of precision medicine. In this paper, we evaluated the prognostic and predictive potential of ccfDNA parameters in early and advanced breast cancer. Groups consisted of 150 and 16 breast cancer patients under adjuvant and neoadjuvant therapy respectively, 34 patients with metastatic disease and 35 healthy volunteers. Direct quantification of ccfDNA in plasma revealed elevated concentrations correlated to the incidence of death, shorter PFS, and non-response to pharmacotherapy in the metastatic but not in the other groups. The methylation status of a panel of cancer-related genes chosen based on previous expression and epigenetic data (KLK10, SOX17, WNT5A, MSH2, GATA3) was assessed by quantitative methylation-specific PCR. All but the GATA3 gene was more frequently methylated in all the patient groups than in healthy individuals (all p < 0.05). The methylation of WNT5A was statistically significantly correlated to greater tumor size and poor prognosis characteristics and in advanced stage disease with shorter OS. In the metastatic group, also SOX17 methylation was significantly correlated to the incidence of death, shorter PFS, and OS. KLK10 methylation was significantly correlated to unfavorable clinicopathological characteristics and relapse, whereas in the adjuvant group to shorter DFI. Methylation of at least 3 or 4 genes was significantly correlated to shorter OS and no pharmacotherapy response, respectively. Classification analysis by a fully automated, machine learning software produced a single-parametric linear model using ccfDNA plasma concentration values, with great discriminating power to predict response to chemotherapy (AUC 0.803, 95% CI [0.606, 1.000]) in the metastatic group. Two more multi-parametric signatures were produced for the metastatic group, predicting survival and disease outcome. Finally, a multiple logistic regression model was constructed, discriminating between patient groups and healthy individuals. Overall, ccfDNA emerged as a highly potent predictive classifier in metastatic breast cancer. Upon prospective clinical evaluation, all the signatures produced could aid accurate prognosis.
@article{chatzaki2018circulating, abstract = {Blood circulating cell-free DNA (ccfDNA) is a suggested biosource of valuable clinical information for cancer, meeting the need for a minimally-invasive advancement in the route of precision medicine. In this paper, we evaluated the prognostic and predictive potential of ccfDNA parameters in early and advanced breast cancer. Groups consisted of 150 and 16 breast cancer patients under adjuvant and neoadjuvant therapy respectively, 34 patients with metastatic disease and 35 healthy volunteers. Direct quantification of ccfDNA in plasma revealed elevated concentrations correlated to the incidence of death, shorter PFS, and non-response to pharmacotherapy in the metastatic but not in the other groups. The methylation status of a panel of cancer-related genes chosen based on previous expression and epigenetic data (KLK10, SOX17, WNT5A, MSH2, GATA3) was assessed by quantitative methylation-specific PCR. All but the GATA3 gene was more frequently methylated in all the patient groups than in healthy individuals (all p < 0.05). The methylation of WNT5A was statistically significantly correlated to greater tumor size and poor prognosis characteristics and in advanced stage disease with shorter OS. In the metastatic group, also SOX17 methylation was significantly correlated to the incidence of death, shorter PFS, and OS. KLK10 methylation was significantly correlated to unfavorable clinicopathological characteristics and relapse, whereas in the adjuvant group to shorter DFI. Methylation of at least 3 or 4 genes was significantly correlated to shorter OS and no pharmacotherapy response, respectively. Classification analysis by a fully automated, machine learning software produced a single-parametric linear model using ccfDNA plasma concentration values, with great discriminating power to predict response to chemotherapy (AUC 0.803, 95% CI [0.606, 1.000]) in the metastatic group. Two more multi-parametric signatures were produced for the metastatic group, predicting survival and disease outcome. Finally, a multiple logistic regression model was constructed, discriminating between patient groups and healthy individuals. Overall, ccfDNA emerged as a highly potent predictive classifier in metastatic breast cancer. Upon prospective clinical evaluation, all the signatures produced could aid accurate prognosis.}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Panagopoulou, Maria and Karaglani, Makrina and Balgkouranidou, Ioanna and Vasilakakis, Vassilis and Biziota, Eirini and Koukaki, Theodora and Karamitrousis, Emmanouil and Nena, Evangelia and Tsamardinos, Ioannis and Kolios, George and Lianidou, Evi and Kakolyris, Stylianos and Chatzaki, Ekaterini}, biburl = {https://www.bibsonomy.org/bibtex/2c5ff834e9bb4c473c863063d11c8c453/mensxmachina}, const = {\ text}, doi = {10.1038/s41388-018-0660-y}, interhash = {a7720c39b927fe3d5b34641f3419e435}, intrahash = {c5ff834e9bb4c473c863063d11c8c453}, journal = {Oncogene }, keywords = {imported}, month = {January}, number = 18, pages = {3387-3401}, timestamp = {2019-09-26T16:06:01.000+0200}, title = {Circulating cell free DNA in Breast cancer: size profiling, levels and methylation patterns lead to prognostic and predictive classifiers}, url = {https://www.nature.com/articles/s41388-018-0660-y}, volume = 38, year = 2019 }
2018
- K. Tsirlis, V. Lagani, S. Triantafillou, and I. Tsamardinos, "On scoring Maximal Ancestral Graphs with the Max\textendashMin Hill Climbing algorithm," International Journal of Approximate Reasoning, vol. 102, pp. 74-85, 2018. doi:10.1016/j.ijar.2018.08.002
[BibTeX] [Abstract] [Download PDF]
We consider the problem of causal structure learning in presence of latent confounders. We propose a hybrid method, MAG Max–Min Hill-Climbing (M3HC) that takes as input a data set of continuous variables, assumed to follow a multivariate Gaussian distribution, and outputs the best fitting maximal ancestral graph. M3HC builds upon a previously proposed method, namely GSMAG, by introducing a constraint-based first phase that greatly reduces the space of structures to investigate. On a large scale experimentation we show that the proposed algorithm greatly improves on GSMAG in all comparisons, and over a set of known networks from the literature it compares positively against FCI and cFCI as well as competitively against GFCI, three well known constraint-based approaches for causal-network reconstruction in presence of latent confounders.
@article{Tsirlis_2018, abstract = {We consider the problem of causal structure learning in presence of latent confounders. We propose a hybrid method, MAG Max–Min Hill-Climbing (M3HC) that takes as input a data set of continuous variables, assumed to follow a multivariate Gaussian distribution, and outputs the best fitting maximal ancestral graph. M3HC builds upon a previously proposed method, namely GSMAG, by introducing a constraint-based first phase that greatly reduces the space of structures to investigate. On a large scale experimentation we show that the proposed algorithm greatly improves on GSMAG in all comparisons, and over a set of known networks from the literature it compares positively against FCI and cFCI as well as competitively against GFCI, three well known constraint-based approaches for causal-network reconstruction in presence of latent confounders.}, added-at = {2019-02-01T13:46:22.000+0100}, author = {Tsirlis, Konstantinos and Lagani, Vincenzo and Triantafillou, Sofia and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/2215e0151afb6ceec0a37601e3028e865/mensxmachina}, doi = {10.1016/j.ijar.2018.08.002}, interhash = {7f00d0fcad8c52b8cce71b771e7cb3e5}, intrahash = {215e0151afb6ceec0a37601e3028e865}, journal = {International Journal of Approximate Reasoning}, keywords = {mxmcausalpath}, month = {November}, pages = {74-85}, publisher = {Elsevier {BV}}, timestamp = {2021-03-10T09:26:09.000+0100}, title = {On scoring Maximal Ancestral Graphs with the Max{\textendash}Min Hill Climbing algorithm}, url = {https://doi.org/10.1016%2Fj.ijar.2018.08.002}, volume = 102, year = 2018 }
- M. Tsagris, "Bayesian Network Learning with the PC Algorithm: An Improved and Correct Variation," Applied Artificial Intelligence , vol. 33, iss. 2, pp. 101-123, 2018. doi:10.1080/08839514.2018.1526760
[BibTeX] [Abstract] [Download PDF]
PC is a prototypical constraint-based algorithm for learning Bayesian networks, a special case of directed acyclic graphs. An existing variant of it, in the R package pcalg, was developed to make the skeleton phase order independent. In return, it has notably increased execution time. In this paper, we clarify that the PC algorithm the skeleton phase of PC is indeed order independent. The modification we propose outperforms pcalg’s variant of the PC in terms of returning correct networks of better quality as is less prone to errors and in some cases it is a lot more computationally cheaper. In addition, we show that pcalg’s variant does not return valid acyclic graphs.
@article{michail2018bayesian, abstract = {PC is a prototypical constraint-based algorithm for learning Bayesian networks, a special case of directed acyclic graphs. An existing variant of it, in the R package pcalg, was developed to make the skeleton phase order independent. In return, it has notably increased execution time. In this paper, we clarify that the PC algorithm the skeleton phase of PC is indeed order independent. The modification we propose outperforms pcalg’s variant of the PC in terms of returning correct networks of better quality as is less prone to errors and in some cases it is a lot more computationally cheaper. In addition, we show that pcalg’s variant does not return valid acyclic graphs.}, added-at = {2019-02-01T12:44:24.000+0100}, author = {Tsagris, Michail}, biburl = {https://www.bibsonomy.org/bibtex/2af15fddd4692e8ba7df0f00f5de6fd23/mensxmachina}, doi = {10.1080/08839514.2018.1526760}, interhash = {6c22e7e09e9aa7a24536cef5b953528e}, intrahash = {af15fddd4692e8ba7df0f00f5de6fd23}, journal = {Applied Artificial Intelligence }, keywords = {mxmcausalpath}, number = 2, pages = {101-123}, timestamp = {2021-03-10T09:26:33.000+0100}, title = {Bayesian Network Learning with the PC Algorithm: An Improved and Correct Variation}, url = {https://www.researchgate.net/profile/Michail_Tsagris/publication/327884019_Bayesian_Network_Learning_with_the_PC_Algorithm_An_Improved_and_Correct_Variation/links/5bab44c945851574f7e65688/Bayesian-Network-Learning-with-the-PC-Algorithm-An-Improved-and-Correct-Variation.pdf}, volume = 33, year = 2018 }
- I. Tsamardinos, E. Greasidou, and G. Borboudakis, "Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation," Machine Learning, vol. 107, iss. 12, pp. 1895-1922, 2018. doi:10.1007/s10994-018-5714-4
[BibTeX] [Abstract] [Download PDF]
Cross-Validation (CV), and out-of-sample performance-estimation protocols in general, are often employed both for (a) selecting the optimal combination of algorithms and values of hyper-parameters (called a configuration) for producing the final predictive model, and (b) estimating the predictive performance of the final model. However, the cross-validated performance of the best configuration is optimistically biased. We present an efficient bootstrap method that corrects for the bias, called Bootstrap Bias Corrected CV (BBC-CV). BBC-CV's main idea is to bootstrap the whole process of selecting the best-performing configuration on the out-of-sample predictions of each configuration, without additional training of models. In comparison to the alternatives, namely the nested cross-validation (Varma and Simon in BMC Bioinform 7(1):91, 2006) and a method by Tibshirani and Tibshirani (Ann Appl Stat 822--829, 2009), BBC-CV is computationally more efficient, has smaller variance and bias, and is applicable to any metric of performance (accuracy, AUC, concordance index, mean squared error). Subsequently, we employ again the idea of bootstrapping the out-of-sample predictions to speed up the CV process. Specifically, using a bootstrap-based statistical criterion we stop training of models on new folds of inferior (with high probability) configurations. We name the method Bootstrap Bias Corrected with Dropping CV (BBCD-CV) that is both efficient and provides accurate performance estimates.
@article{Tsamardinos2018, abstract = {Cross-Validation (CV), and out-of-sample performance-estimation protocols in general, are often employed both for (a) selecting the optimal combination of algorithms and values of hyper-parameters (called a configuration) for producing the final predictive model, and (b) estimating the predictive performance of the final model. However, the cross-validated performance of the best configuration is optimistically biased. We present an efficient bootstrap method that corrects for the bias, called Bootstrap Bias Corrected CV (BBC-CV). BBC-CV's main idea is to bootstrap the whole process of selecting the best-performing configuration on the out-of-sample predictions of each configuration, without additional training of models. In comparison to the alternatives, namely the nested cross-validation (Varma and Simon in BMC Bioinform 7(1):91, 2006) and a method by Tibshirani and Tibshirani (Ann Appl Stat 822--829, 2009), BBC-CV is computationally more efficient, has smaller variance and bias, and is applicable to any metric of performance (accuracy, AUC, concordance index, mean squared error). Subsequently, we employ again the idea of bootstrapping the out-of-sample predictions to speed up the CV process. Specifically, using a bootstrap-based statistical criterion we stop training of models on new folds of inferior (with high probability) configurations. We name the method Bootstrap Bias Corrected with Dropping CV (BBCD-CV) that is both efficient and provides accurate performance estimates.}, added-at = {2019-01-18T12:36:47.000+0100}, author = {Tsamardinos, Ioannis and Greasidou, Elissavet and Borboudakis, Giorgos}, biburl = {https://www.bibsonomy.org/bibtex/2a7b174604425057fa831058ace0c969a/mensxmachina}, day = 01, doi = {10.1007/s10994-018-5714-4}, interhash = {97a8dcbb7f6259a554fb36d14f44bc47}, intrahash = {a7b174604425057fa831058ace0c969a}, issn = {1573-0565}, journal = {Machine Learning}, keywords = {mxmcausalpath}, month = dec, number = 12, pages = {1895--1922}, timestamp = {2021-03-10T09:27:16.000+0100}, title = {Bootstrapping the out-of-sample predictions for efficient and accurate cross-validation}, url = {https://doi.org/10.1007/s10994-018-5714-4}, volume = 107, year = 2018 }
- I. Tsamardinos, G. Borboudakis, P. Katsogridakis, P. Pratikakis, and V. Christophides, "A greedy feature selection algorithm for Big Data of high dimensionality," Machine Learning, vol. 108, iss. 2, pp. 149-202, 2018. doi:10.1007/s10994-018-5748-7
[BibTeX] [Abstract] [Download PDF]
We present the Parallel, Forward--Backward with Pruning (PFBP) algorithm for feature selection (FS) for Big Data of high dimensionality. PFBP partitions the data matrix both in terms of rows as well as columns. By employing the concepts of p-values of conditional independence tests and meta-analysis techniques, PFBP relies only on computations local to a partition while minimizing communication costs, thus massively parallelizing computations. Similar techniques for combining local computations are also employed to create the final predictive model. PFBP employs asymptotically sound heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores. An extensive comparative evaluation also demonstrates the effectiveness of PFBP against other algorithms in its class. The heuristics presented are general and could potentially be employed to other greedy-type of FS algorithms. An application on simulated Single Nucleotide Polymorphism (SNP) data with 500K samples is provided as a use case.
@article{Tsamardinos2018, abstract = {We present the Parallel, Forward--Backward with Pruning (PFBP) algorithm for feature selection (FS) for Big Data of high dimensionality. PFBP partitions the data matrix both in terms of rows as well as columns. By employing the concepts of p-values of conditional independence tests and meta-analysis techniques, PFBP relies only on computations local to a partition while minimizing communication costs, thus massively parallelizing computations. Similar techniques for combining local computations are also employed to create the final predictive model. PFBP employs asymptotically sound heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores. An extensive comparative evaluation also demonstrates the effectiveness of PFBP against other algorithms in its class. The heuristics presented are general and could potentially be employed to other greedy-type of FS algorithms. An application on simulated Single Nucleotide Polymorphism (SNP) data with 500K samples is provided as a use case.}, added-at = {2019-01-18T12:33:10.000+0100}, author = {Tsamardinos, Ioannis and Borboudakis, Giorgos and Katsogridakis, Pavlos and Pratikakis, Polyvios and Christophides, Vassilis}, biburl = {https://www.bibsonomy.org/bibtex/252633119492a482314134f28a7639e3b/mensxmachina}, day = 07, doi = {10.1007/s10994-018-5748-7}, interhash = {758062c5debf18afdf43d46ce6105a72}, intrahash = {52633119492a482314134f28a7639e3b}, issn = {1573-0565}, journal = {Machine Learning}, keywords = {mxmcausalpath}, month = {August}, number = 2, pages = {149-202}, timestamp = {2021-03-10T09:27:38.000+0100}, title = {A greedy feature selection algorithm for Big Data of high dimensionality}, url = {https://doi.org/10.1007/s10994-018-5748-7}, volume = 108, year = 2018 }
- M. Adamou, G. Antoniou, E. Greasidou, V. Lagani, P. Charonyktakis, I. Tsamardinos, and M. Doyle, "Toward Automatic Risk Assessment to Support Suicide Prevention," Crisis, vol. 40, pp. 249-256, 2018. doi:10.1027/0227-5910/a000561
[BibTeX] [Abstract] [Download PDF]
Background: Suicide has been considered an important public health issue for years and is one of the main causes of death worldwide. Despite prevention strategies being applied, the rate of suicide has not changed substantially over the past decades. Suicide risk has proven extremely difficult to assess for medical specialists, and traditional methodologies deployed have been ineffective. Advances in machine learning make it possible to attempt to predict suicide with the analysis of relevant data aiming to inform clinical practice. Aims: We aimed to (a) test our artificial intelligence based, referral-centric methodology in the context of the National Health Service (NHS), (b) determine whether statistically relevant results can be derived from data related to previous suicides, and (c) develop ideas for various exploitation strategies. Method: The analysis used data of patients who died by suicide in the period 2013–2016 including both structured data and free-text medical notes, necessitating the deployment of state-of-the-art machine learning and text mining methods. Limitations: Sample size is a limiting factor for this study, along with the absence of non-suicide cases. Specific analytical solutions were adopted for addressing both issues. Results and Conclusion: The results of this pilot study indicate that machine learning shows promise for predicting within a specified period which people are most at risk of taking their own life at the time of referral to a mental health service.
@article{Adamou_2018, abstract = {Background: Suicide has been considered an important public health issue for years and is one of the main causes of death worldwide. Despite prevention strategies being applied, the rate of suicide has not changed substantially over the past decades. Suicide risk has proven extremely difficult to assess for medical specialists, and traditional methodologies deployed have been ineffective. Advances in machine learning make it possible to attempt to predict suicide with the analysis of relevant data aiming to inform clinical practice. Aims: We aimed to (a) test our artificial intelligence based, referral-centric methodology in the context of the National Health Service (NHS), (b) determine whether statistically relevant results can be derived from data related to previous suicides, and (c) develop ideas for various exploitation strategies. Method: The analysis used data of patients who died by suicide in the period 2013–2016 including both structured data and free-text medical notes, necessitating the deployment of state-of-the-art machine learning and text mining methods. Limitations: Sample size is a limiting factor for this study, along with the absence of non-suicide cases. Specific analytical solutions were adopted for addressing both issues. Results and Conclusion: The results of this pilot study indicate that machine learning shows promise for predicting within a specified period which people are most at risk of taking their own life at the time of referral to a mental health service.}, added-at = {2019-01-18T12:10:16.000+0100}, author = {Adamou, Marios and Antoniou, Grigoris and Greasidou, Elissavet and Lagani, Vincenzo and Charonyktakis, Paulos and Tsamardinos, Ioannis and Doyle, Michael}, biburl = {https://www.bibsonomy.org/bibtex/221cb477ce69d0e4a6bd26823d7f22342/mensxmachina}, doi = {10.1027/0227-5910/a000561}, interhash = {aefe5230d12e9dad8b6e56b72f05cf9d}, intrahash = {21cb477ce69d0e4a6bd26823d7f22342}, journal = {Crisis}, keywords = {suicide}, month = {November}, pages = {249-256}, publisher = {Hogrefe Publishing Group}, timestamp = {2019-09-26T16:14:48.000+0200}, title = {Toward Automatic Risk Assessment to Support Suicide Prevention}, url = {https://doi.org/10.1027%2F0227-5910%2Fa000561}, volume = 40, year = 2018 }
- M. Tsagris, G. Borboudakis, V. Lagani, and I. Tsamardinos, "Constraint-based causal discovery with mixed data," International Journal of Data Science and Analytics, vol. 6, iss. 1, pp. 19-30, 2018. doi:10.1007/s41060-018-0097-y
[BibTeX] [Abstract] [Download PDF]
We address the problem of constraint-based causal discovery with mixed data types, such as (but not limited to) continuous, binary, multinomial and or-dinal variables. We use likelihood-ratio tests based on appropriate regression models, and show how to derive symmetric conditional independence tests. Such tests can then be directly used by existing constraint-based methods with mixed data, such as the PC and FCI algorithms for learning Bayesian networks and maximal ancestral graphs respectively. In experiments on simu-lated Bayesian networks, we employ the PC algorithm with different conditional independence tests for mixed data, and show that the proposed approach outperforms alternatives in terms of learning accuracy.
@article{Tsagris2018, abstract = {We address the problem of constraint-based causal discovery with mixed data types, such as (but not limited to) continuous, binary, multinomial and or-dinal variables. We use likelihood-ratio tests based on appropriate regression models, and show how to derive symmetric conditional independence tests. Such tests can then be directly used by existing constraint-based methods with mixed data, such as the PC and FCI algorithms for learning Bayesian networks and maximal ancestral graphs respectively. In experiments on simu-lated Bayesian networks, we employ the PC algorithm with different conditional independence tests for mixed data, and show that the proposed approach outperforms alternatives in terms of learning accuracy.}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsagris, Michail and Borboudakis, Giorgos and Lagani, Vincenzo and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/223125f845b32b6d3fbc2bdde022623eb/mensxmachina}, const = {\ text}, doi = {10.1007/s41060-018-0097-y}, interhash = {918ab366354b2932cdd4a92d4e7884e3}, intrahash = {23125f845b32b6d3fbc2bdde022623eb}, journal = {International Journal of Data Science and Analytics}, keywords = {mxmcausalpath}, month = {August}, number = 1, pages = {19-30}, timestamp = {2021-03-10T09:54:25.000+0100}, title = {Constraint-based causal discovery with mixed data}, url = {https://doi.org/10.1007/s41060-018-0097-y http://link.springer.com/10.1007/s41060-018-0097-y}, volume = 6, year = 2018 }
- K. Lakiotaki, N. Vorniotakis, M. Tsagris, G. Georgakopoulos, and I. Tsamardinos, "BioDataome: a collection of uniformly preprocessed and automatically annotated datasets for data-driven biology," Database, vol. 2018, iss. bay011, pp. 1-14, 2018. doi:10.1093/database/bay011
[BibTeX] [Abstract]
Biotechnology revolution generates a plethora of omics data with an exponential growth pace. Therefore, biological data mining demands automatic, ‘high quality’ curation efforts to organize biomedical knowledge into online databases. BioDataome is a database of uniformly preprocessed and disease-annotated omics data with the aim to promote and accelerate the reuse of public data. We followed the same preprocessing pipeline for each biological mart (microarray gene expression, RNA-Seq gene expression and DNA methylation) to produce ready for downstream analysis datasets and automatically annotated them with disease-ontology terms. We also designate datasets that share common samples and automatically discover control samples in case-control studies. Currently, BioDataome includes ∼5600 datasets, ∼260 000 samples spanning ∼500 diseases and can be easily used in large-scale massive experiments and meta-analysis. All datasets are publicly available for querying and downloading via BioDataome web application. We demonstrate BioDataome’s utility by presenting exploratory data analysis examples. We have also developed BioDataome R package found in: https://github.com/mensxmachina/BioDataome/. Database URL: http://dataome.mensxmachina.org/
@article{Lakiotaki2018, abstract = {Biotechnology revolution generates a plethora of omics data with an exponential growth pace. Therefore, biological data mining demands automatic, ‘high quality’ curation efforts to organize biomedical knowledge into online databases. BioDataome is a database of uniformly preprocessed and disease-annotated omics data with the aim to promote and accelerate the reuse of public data. We followed the same preprocessing pipeline for each biological mart (microarray gene expression, RNA-Seq gene expression and DNA methylation) to produce ready for downstream analysis datasets and automatically annotated them with disease-ontology terms. We also designate datasets that share common samples and automatically discover control samples in case-control studies. Currently, BioDataome includes ∼5600 datasets, ∼260 000 samples spanning ∼500 diseases and can be easily used in large-scale massive experiments and meta-analysis. All datasets are publicly available for querying and downloading via BioDataome web application. We demonstrate BioDataome’s utility by presenting exploratory data analysis examples. We have also developed BioDataome R package found in: https://github.com/mensxmachina/BioDataome/. Database URL: http://dataome.mensxmachina.org/}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Lakiotaki, Kleanthi and Vorniotakis, Nikolaos and Tsagris, Michail and Georgakopoulos, Georgios and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/286ecdf84f503ed17df225b0646dbdbe0/mensxmachina}, const = {\ text}, doi = {10.1093/database/bay011}, interhash = {e6cafd740574c1f9bcf3bb532e0dba46}, intrahash = {86ecdf84f503ed17df225b0646dbdbe0}, journal = {Database}, keywords = {mxmcausalpath}, month = {March}, number = {bay011}, pages = {1-14}, timestamp = {2021-03-10T09:53:31.000+0100}, title = {BioDataome: a collection of uniformly preprocessed and automatically annotated datasets for data-driven biology}, volume = 2018, year = 2018 }
- M. Tsagris, V. Lagani, and I. Tsamardinos, " Feature selection for high-dimensional temporal data," BMC Bioinformatics, vol. 19, iss. 17, pp. 1-14, 2018. doi:10.1186/s12859-018-2023-7
[BibTeX] [Abstract] [Download PDF]
Feature selection is commonly employed for identifying collectively-predictive biomarkers and biosignatures; it facilitates the construction of small statistical models that are easier to verify, visualize, and comprehend while providing insight to the human expert. In this work, we extend established constrained-based, feature-selection methods to high-dimensional “omics” temporal data, where the number of measurements is orders of magnitude larger than the sample size. The extension required the development of conditional independence tests for temporal and/or static variables conditioned on a set of temporal variables. The algorithm is able to return multiple, equivalent solution subsets of variables, scale to tens of thousands of features, and outperform or be on par with existing methods depending on the analysis task specifics. The use of this algorithm is suggested for variable selection with high-dimensional temporal data.
@article{Tsagris2018a, abstract = {Feature selection is commonly employed for identifying collectively-predictive biomarkers and biosignatures; it facilitates the construction of small statistical models that are easier to verify, visualize, and comprehend while providing insight to the human expert. In this work, we extend established constrained-based, feature-selection methods to high-dimensional “omics” temporal data, where the number of measurements is orders of magnitude larger than the sample size. The extension required the development of conditional independence tests for temporal and/or static variables conditioned on a set of temporal variables. The algorithm is able to return multiple, equivalent solution subsets of variables, scale to tens of thousands of features, and outperform or be on par with existing methods depending on the analysis task specifics. The use of this algorithm is suggested for variable selection with high-dimensional temporal data.}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsagris, Michail and Lagani, Vincenzo and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/27a3e7629f52cfe1f19a5aedda217fa22/mensxmachina}, const = {\ text}, doi = {10.1186/s12859-018-2023-7}, interhash = {e57a8785d79c3782b45dd3953a7d7985}, intrahash = {7a3e7629f52cfe1f19a5aedda217fa22}, journal = {BMC Bioinformatics}, keywords = {mxmcausalpath}, month = {January}, number = 17, pages = {1-14}, timestamp = {2021-03-10T09:53:02.000+0100}, title = { Feature selection for high-dimensional temporal data}, url = {https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2023-7}, volume = 19, year = 2018 }
- M. Adamou, G. Antoniou, E. Greassidou, V. Lagani, P. Charonyktakis, and I. Tsamardinos, "Mining Free-Text Medical Notes for Suicide Risk Assessment," in SETN '18 Proceedings of the 10th Hellenic Conference on Artificial Intelligence, 2018. doi:10.1145/3200947.3201020
[BibTeX] [Abstract] [Download PDF]
Suicide has been considered as an important public health issue for a very long time, and is one of the main causes of death worldwide. Despite suicide prevention strategies being applied, the rate of suicide has not changed substantially over the past decades. Advances in machine learning make it possible to attempt to predict suicide based on the analysis of relevant data to inform clinical practice. This paper reports on findings from the analysis of data of patients who died by suicide in the period 2013-2016 and made use of both structured data and free-text medical notes. We focus on examining various text-mining approaches to support risk assessment. The results show that using advance machine learning and text-mining techniques, it is possible to predict within a specified period which people are most at risk of taking their own life at the time of referral to a mental health service.
@inproceedings{Adamou2018a, abstract = {Suicide has been considered as an important public health issue for a very long time, and is one of the main causes of death worldwide. Despite suicide prevention strategies being applied, the rate of suicide has not changed substantially over the past decades. Advances in machine learning make it possible to attempt to predict suicide based on the analysis of relevant data to inform clinical practice. This paper reports on findings from the analysis of data of patients who died by suicide in the period 2013-2016 and made use of both structured data and free-text medical notes. We focus on examining various text-mining approaches to support risk assessment. The results show that using advance machine learning and text-mining techniques, it is possible to predict within a specified period which people are most at risk of taking their own life at the time of referral to a mental health service.}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Adamou, Marios and Antoniou, Grigoris and Greassidou, Elissavet and Lagani, Vincenzo and Charonyktakis, Paulos and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/2a852fb011200a7f53d1f2f28da6f8ea7/mensxmachina}, booktitle = {SETN '18 Proceedings of the 10th Hellenic Conference on Artificial Intelligence}, doi = {10.1145/3200947.3201020}, interhash = {0bb1d9382f6832a0e6034a1f4b9bce1a}, intrahash = {a852fb011200a7f53d1f2f28da6f8ea7}, keywords = {imported}, number = 47, timestamp = {2019-09-26T16:23:51.000+0200}, title = {Mining Free-Text Medical Notes for Suicide Risk Assessment}, url = {https://www.bibsonomy.org/documents/d5db6ecdc9dc8add8df6d0745e06b7f5/mensxmachina/a47-Adamou.pdf}, year = 2018 }
- M. Markaki, I. Tsamardinos, A. Langhammer, V. Lagani, K. Hveem, and O. D. Røe, "A Validated Clinical Risk Prediction Model for Lung Cancer in Smokers of All Ages and Exposure Types: A HUNT Study.," EBioMedicine, vol. 31, pp. 34-46, 2018. doi:10.1016/j.ebiom.2018.03.027
[BibTeX] [Abstract]
Lung cancer causes >1·6 million deaths annually, with early diagnosis being paramount to effective treatment. Here we present a validated risk assessment model for lung cancer screening. The prospective HUNT2 population study in Norway examined 65,237 people aged >20years in 1995-97. After a median of 15·2years, 583 lung cancer cases had been diagnosed; 552 (94·7%) ever-smokers and 31 (5·3%) never-smokers. We performed multivariable analyses of 36 candidate risk predictors, using multiple imputation of missing data and backwards feature selection with Cox regression. The resulting model was validated in an independent Norwegian prospective dataset of 45,341 ever-smokers, in which 675 lung cancers had been diagnosed after a median follow-up of 11·6years. Our final HUNT Lung Cancer Model included age, pack-years, smoking intensity, years since smoking cessation, body mass index, daily cough, and hours of daily indoors exposure to smoke. External validation showed a 0·879 concordance index (95% CI 0·866-0·891) with an area under the curve of 0·87 (95% CI 0·85-0·89) within 6years. Only 22% of ever-smokers would need screening to identify 81·85% of all lung cancers within 6years. Our model of seven variables is simple, accurate, and useful for screening selection.
@article{Markaki2018, abstract = {Lung cancer causes >1·6 million deaths annually, with early diagnosis being paramount to effective treatment. Here we present a validated risk assessment model for lung cancer screening. The prospective HUNT2 population study in Norway examined 65,237 people aged >20years in 1995-97. After a median of 15·2years, 583 lung cancer cases had been diagnosed; 552 (94·7%) ever-smokers and 31 (5·3%) never-smokers. We performed multivariable analyses of 36 candidate risk predictors, using multiple imputation of missing data and backwards feature selection with Cox regression. The resulting model was validated in an independent Norwegian prospective dataset of 45,341 ever-smokers, in which 675 lung cancers had been diagnosed after a median follow-up of 11·6years. Our final HUNT Lung Cancer Model included age, pack-years, smoking intensity, years since smoking cessation, body mass index, daily cough, and hours of daily indoors exposure to smoke. External validation showed a 0·879 concordance index (95% CI 0·866-0·891) with an area under the curve of 0·87 (95% CI 0·85-0·89) within 6years. Only 22% of ever-smokers would need screening to identify 81·85% of all lung cancers within 6years. Our model of seven variables is simple, accurate, and useful for screening selection.}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Markaki, Maria and Tsamardinos, Ioannis and Langhammer, Arnulf and Lagani, Vincenzo and Hveem, Kristian and R{\o}e, Oluf Dimitri}, biburl = {https://www.bibsonomy.org/bibtex/252ad9fc293f48de7a8c6b5a218222e24/mensxmachina}, const = {\ text}, doi = {10.1016/j.ebiom.2018.03.027}, interhash = {1bce5eb7bd51017bf19c97cecf30d30e}, intrahash = {52ad9fc293f48de7a8c6b5a218222e24}, journal = {EBioMedicine}, keywords = {imported}, month = may, pages = {34-46}, timestamp = {2019-09-26T16:16:59.000+0200}, title = {A Validated Clinical Risk Prediction Model for Lung Cancer in Smokers of All Ages and Exposure Types: A HUNT Study.}, volume = 31, year = 2018 }
2017
- M. Tsagris, G. Borboudakis, V. Lagani, and I. Tsamardinos, "Constraint-based Causal Discovery with Mixed Data," , 2017.
[BibTeX] [Abstract] [Download PDF]
We address the problem of constraint-based causal discovery with mixed data types, such as (but not limited to) continuous, binary, multinomial and ordinal variables. We use likelihood-ratio tests based on appropriate regression models, and show how to derive symmetric conditional independence tests. Such tests can then be directly used by existing constraint-based methods with mixed data, such as the PC and FCI algorithms for learning Bayesian networks and maximal ancestral graphs respectively. In experiments on simulated Bayesian networks, we employ the PC algorithm with different conditional independence tests for mixed data, and show that the proposed approach outperforms alternatives in terms of learning accuracy.
@conference{noauthororeditor2017constraintbased, abstract = {We address the problem of constraint-based causal discovery with mixed data types, such as (but not limited to) continuous, binary, multinomial and ordinal variables. We use likelihood-ratio tests based on appropriate regression models, and show how to derive symmetric conditional independence tests. Such tests can then be directly used by existing constraint-based methods with mixed data, such as the PC and FCI algorithms for learning Bayesian networks and maximal ancestral graphs respectively. In experiments on simulated Bayesian networks, we employ the PC algorithm with different conditional independence tests for mixed data, and show that the proposed approach outperforms alternatives in terms of learning accuracy.}, added-at = {2021-03-10T10:58:29.000+0100}, author = {Tsagris, M and Borboudakis, G and Lagani, V and Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/2892378444240fee14d62fd58362e856a/mensxmachina}, interhash = {87d6a33d891429260e644392ddcba508}, intrahash = {892378444240fee14d62fd58362e856a}, keywords = {mxmcausalpath}, publisher = {23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Workshop on Causal Discovery (KDD)}, timestamp = {2021-03-10T10:58:29.000+0100}, title = {Constraint-based Causal Discovery with Mixed Data}, url = {http://nugget.unisa.edu.au/CD2017/papersonly/constraint-based-causal-r1.pdf}, year = 2017 }
- K. Tsirlis, V. Lagani, S. Triantafillou, and I. Tsamardinos, "On Scoring Maximal Ancestral Graphs with the Max-Min Hill Climbing Algorithm," , 2017.
[BibTeX] [Abstract] [Download PDF]
t We consider the problem of causal structure learning in presence of latent confounders. We propose a hybrid method, MAG Max-Min Hill-Climbing (M3HC) that takes as input a data set of continuous variables, assumed to follow a multivariate Gaussian distribution, and outputs the best fitting maximal ancestral graph. M3HC builds upon a previously proposed method, namely GSMAG, by introducing a constraintbased first phase that greatly reduces the space of structures to investigate. We show on simulated data that the proposed algorithm greatly improves on GSMAG, and compares positively against FCI and cFCI, two well known constraint-based approaches for causal-network reconstruction in presence of latent confounders
@conference{tsirlis2017scoring, abstract = {t We consider the problem of causal structure learning in presence of latent confounders. We propose a hybrid method, MAG Max-Min Hill-Climbing (M3HC) that takes as input a data set of continuous variables, assumed to follow a multivariate Gaussian distribution, and outputs the best fitting maximal ancestral graph. M3HC builds upon a previously proposed method, namely GSMAG, by introducing a constraintbased first phase that greatly reduces the space of structures to investigate. We show on simulated data that the proposed algorithm greatly improves on GSMAG, and compares positively against FCI and cFCI, two well known constraint-based approaches for causal-network reconstruction in presence of latent confounders}, added-at = {2021-03-10T10:55:47.000+0100}, author = {Tsirlis, K and Lagani, V and Triantafillou, S and Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/251782ff3d0021d9ae7b7229b39a55d75/mensxmachina}, interhash = {4731b83fe8b2f1f60eed63d178912109}, intrahash = {51782ff3d0021d9ae7b7229b39a55d75}, keywords = {mxmcausalpath}, publisher = {23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Workshop on Causal Discovery (KDD)}, timestamp = {2021-03-10T10:55:47.000+0100}, title = {On Scoring Maximal Ancestral Graphs with the Max-Min Hill Climbing Algorithm}, url = {http://nugget.unisa.edu.au/CD2017/papersonly/maxmin-r0.pdf}, year = 2017 }
- G. Borboudakis, T. Stergiannakos, M. Frysali, E. Klontzas, I. Tsamardinos, and G. E. Froudakis, "Chemically intuited, large-scale screening of MOFs by machine learning techniques," NPJ Computational Materials, vol. 3, iss. 40, 2017. doi:10.1038/s41524-017-0045-8
[BibTeX] [Abstract] [Download PDF]
A novel computational methodology for large-scale screening of MOFs is applied to gas storage with the use of machine learning technologies. This approach is a promising trade-off between the accuracy of ab initio methods and the speed of classical approaches, strategically combined with chemical intuition. The results demonstrate that the chemical properties of MOFs are indeed predictable (stochastically, not deterministically) using machine learning methods and automated analysis protocols, with the accuracy of predictions increasing with sample size. Our initial results indicate that this methodology is promising to apply not only to gas storage in MOFs but in many other material science projects.
@article{borboudakis2017chemically, abstract = {A novel computational methodology for large-scale screening of MOFs is applied to gas storage with the use of machine learning technologies. This approach is a promising trade-off between the accuracy of ab initio methods and the speed of classical approaches, strategically combined with chemical intuition. The results demonstrate that the chemical properties of MOFs are indeed predictable (stochastically, not deterministically) using machine learning methods and automated analysis protocols, with the accuracy of predictions increasing with sample size. Our initial results indicate that this methodology is promising to apply not only to gas storage in MOFs but in many other material science projects.}, added-at = {2019-09-26T16:47:50.000+0200}, author = {Borboudakis, Giorgos and Stergiannakos, Txiarchis and Frysali, Maria and Klontzas, Emmanuel and Tsamardinos, Ioannis and Froudakis, George E.}, biburl = {https://www.bibsonomy.org/bibtex/25bde5694eb139306e6b6619b48a22be7/mensxmachina}, doi = {10.1038/s41524-017-0045-8}, interhash = {1bc83725a35ce15678399a739e9c76bf}, intrahash = {5bde5694eb139306e6b6619b48a22be7}, journal = {NPJ Computational Materials}, keywords = {MOFs learning machine}, month = {October}, number = 40, timestamp = {2019-09-26T16:47:50.000+0200}, title = {Chemically intuited, large-scale screening of MOFs by machine learning techniques}, url = {https://doi.org/10.1038/s41524-017-0045-8}, volume = 3, year = 2017 }
- V. Lagani, G. Athineou, A. Farcomeni, M. Tsagris, and I. Tsamardinos, "Feature Selection with the R Package MXM: Discovering Statistically Equivalent Feature Subsets," Journal of Statistical Software, vol. 80, iss. 7, 2017. doi:10.18637/jss.v080.i07
[BibTeX] [Abstract] [Download PDF]
The statistically equivalent signature (SES) algorithm is a method for feature selection inspired by the principles of constraint-based learning of Bayesian networks. Most of the currently available feature selection methods return only a single subset of features, supposedly the one with the highest predictive power. We argue that in several domains multiple subsets can achieve close to maximal predictive accuracy, and that arbitrarily providing only one has several drawbacks. The SES method attempts to identify multiple, predictive feature subsets whose performances are statistically equivalent. In that respect the SES algorithm subsumes and extends previous feature selection algorithms, like the max-min parent children algorithm. The SES algorithm is implemented in an homonym function included in the R package MXM, standing for mens ex machina, meaning 'mind from the machine' in Latin. The MXM implementation of SES handles several data analysis tasks, namely classification, regression and survival analysis. In this paper we present the SES algorithm, its implementation, and provide examples of use of the SES function in R. Furthermore, we analyze three publicly available data sets to illustrate the equivalence of the signatures retrieved by SES and to contrast SES against the state-of-the-art feature selection method LASSO. Our results provide initial evidence that the two methods perform comparably well in terms of predictive accuracy and that multiple, equally predictive signatures are actually present in real world data.
@article{Lagani_2017, abstract = {The statistically equivalent signature (SES) algorithm is a method for feature selection inspired by the principles of constraint-based learning of Bayesian networks. Most of the currently available feature selection methods return only a single subset of features, supposedly the one with the highest predictive power. We argue that in several domains multiple subsets can achieve close to maximal predictive accuracy, and that arbitrarily providing only one has several drawbacks. The SES method attempts to identify multiple, predictive feature subsets whose performances are statistically equivalent. In that respect the SES algorithm subsumes and extends previous feature selection algorithms, like the max-min parent children algorithm. The SES algorithm is implemented in an homonym function included in the R package MXM, standing for mens ex machina, meaning 'mind from the machine' in Latin. The MXM implementation of SES handles several data analysis tasks, namely classification, regression and survival analysis. In this paper we present the SES algorithm, its implementation, and provide examples of use of the SES function in R. Furthermore, we analyze three publicly available data sets to illustrate the equivalence of the signatures retrieved by SES and to contrast SES against the state-of-the-art feature selection method LASSO. Our results provide initial evidence that the two methods perform comparably well in terms of predictive accuracy and that multiple, equally predictive signatures are actually present in real world data.}, added-at = {2019-02-01T14:04:22.000+0100}, author = {Lagani, Vincenzo and Athineou, Giorgos and Farcomeni, Alessio and Tsagris, Michail and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/203594f6365b0d67aa39e3ba38b4b2289/mensxmachina}, doi = {10.18637/jss.v080.i07}, interhash = {2d6c6cbe4da60ea0a19269dad768d0d4}, intrahash = {03594f6365b0d67aa39e3ba38b4b2289}, journal = {Journal of Statistical Software}, keywords = {mxmcausalpath}, number = 7, publisher = {Foundation for Open Access Statistic}, timestamp = {2021-03-10T09:22:09.000+0100}, title = {Feature Selection with the R Package {MXM}: Discovering Statistically Equivalent Feature Subsets}, url = {https://doi.org/10.18637%2Fjss.v080.i07}, volume = 80, year = 2017 }
- G. Orfanoudaki, M. Markaki, K. Chatzi, I. Tsamardinos, and A. Economou, "MatureP: prediction of secreted proteins with exclusive information from their mature regions," Nature Scientific Reports, vol. 7, iss. 1, p. 3263, 2017. doi:10.1038/s41598-017-03557-4
[BibTeX] [Abstract] [Download PDF]
More than a third of the cellular proteome is non-cytoplasmic. Most secretory proteins use the Sec system for export and are targeted to membranes using signal peptides and mature domains. To specifically analyze bacterial mature domain features, we developed MatureP, a classifier that predicts secretory sequences through features exclusively computed from their mature domains. MatureP was trained using Just Add Data Bio, an automated machine learning tool. Mature domains are predicted efficiently with ~92% success, as measured by the Area Under the Receiver Operating Characteristic Curve (AUC). Predictions were validated using experimental datasets of mutated secretory proteins. The features selected by MatureP reveal prominent differences in amino acid content between secreted and cytoplasmic proteins. Amino-terminal mature domain sequences have enhanced disorder, more hydroxyl and polar residues and less hydrophobics. Cytoplasmic proteins have prominent amino-terminal hydrophobic stretches and charged regions downstream. Presumably, secretory mature domains comprise a distinct protein class. They balance properties that promote the necessary flexibility required for the maintenance of non-folded states during targeting and secretion with the ability of post-secretion folding. These findings provide novel insight in protein trafficking, sorting and folding mechanisms and may benefit protein secretion biotechnology.
@article{orfanoudaki2017maturep, abstract = {More than a third of the cellular proteome is non-cytoplasmic. Most secretory proteins use the Sec system for export and are targeted to membranes using signal peptides and mature domains. To specifically analyze bacterial mature domain features, we developed MatureP, a classifier that predicts secretory sequences through features exclusively computed from their mature domains. MatureP was trained using Just Add Data Bio, an automated machine learning tool. Mature domains are predicted efficiently with ~92% success, as measured by the Area Under the Receiver Operating Characteristic Curve (AUC). Predictions were validated using experimental datasets of mutated secretory proteins. The features selected by MatureP reveal prominent differences in amino acid content between secreted and cytoplasmic proteins. Amino-terminal mature domain sequences have enhanced disorder, more hydroxyl and polar residues and less hydrophobics. Cytoplasmic proteins have prominent amino-terminal hydrophobic stretches and charged regions downstream. Presumably, secretory mature domains comprise a distinct protein class. They balance properties that promote the necessary flexibility required for the maintenance of non-folded states during targeting and secretion with the ability of post-secretion folding. These findings provide novel insight in protein trafficking, sorting and folding mechanisms and may benefit protein secretion biotechnology.}, added-at = {2019-02-01T14:01:30.000+0100}, author = {Orfanoudaki, Georgia and Markaki, Maria and Chatzi, Katerina and Tsamardinos, Ioannis and Economou, Anastassios}, biburl = {https://www.bibsonomy.org/bibtex/2951371b60898141f3bcdfc2141b70336/mensxmachina}, doi = {10.1038/s41598-017-03557-4}, interhash = {45976adff24808a1e1f78e330732b5ff}, intrahash = {951371b60898141f3bcdfc2141b70336}, issn = {20452322}, journal = {Nature Scientific Reports}, keywords = {mxmcausalpath}, month = {June}, number = 1, pages = 3263, refid = {Orfanoudaki2017}, timestamp = {2021-03-10T09:25:44.000+0100}, title = {MatureP: prediction of secreted proteins with exclusive information from their mature regions}, url = {https://doi.org/10.1038/s41598-017-03557-4}, volume = 7, year = 2017 }
- G. Papoutsoglou, G. Athineou, V. Lagani, I. Xanthopoulos, A. Schmidt, S. Éliás, J. Tegnér, and I. Tsamardinos, "SCENERY: a web application for (causal) network reconstruction from cytometry data," Nucleic Acids Research, vol. 45, p. W270-W275, 2017. doi:10.1093/nar/gkx448
[BibTeX] [Abstract] [Download PDF]
Flow and mass cytometry technologies can probe proteins as biological markers in thousands of individual cells simultaneously, providing unprecedented opportunities for reconstructing networks of protein interactions through machine learning algorithms. The network reconstruction (NR) problem has been well-studied by the machine learning community. However, the potentials of available methods remain largely unknown to the cytometry community, mainly due to their intrinsic complexity and the lack of comprehensive, powerful and easy-to-use NR software implementations specific for cytometry data. To bridge this gap, we present Single CEll NEtwork Reconstruction sYstem (SCENERY), a web server featuring several standard and advanced cytometry data analysis methods coupled with NR algorithms in a user-friendly, on-line environment. In SCENERY, users may upload their data and set their own study design. The server offers several data analysis options categorized into three classes of methods: data (pre)processing, statistical analysis and NR. The server also provides interactive visualization and download of results as ready-to-publish images or multimedia reports. Its core is modular and based on the widely-used and robust R platform allowing power users to extend its functionalities by submitting their own NR methods. SCENERY is available at scenery.csd.uoc.gr or http://mensxmachina.org/en/software/.
@article{Papoutsoglou2017, abstract = {Flow and mass cytometry technologies can probe proteins as biological markers in thousands of individual cells simultaneously, providing unprecedented opportunities for reconstructing networks of protein interactions through machine learning algorithms. The network reconstruction (NR) problem has been well-studied by the machine learning community. However, the potentials of available methods remain largely unknown to the cytometry community, mainly due to their intrinsic complexity and the lack of comprehensive, powerful and easy-to-use NR software implementations specific for cytometry data. To bridge this gap, we present Single CEll NEtwork Reconstruction sYstem (SCENERY), a web server featuring several standard and advanced cytometry data analysis methods coupled with NR algorithms in a user-friendly, on-line environment. In SCENERY, users may upload their data and set their own study design. The server offers several data analysis options categorized into three classes of methods: data (pre)processing, statistical analysis and NR. The server also provides interactive visualization and download of results as ready-to-publish images or multimedia reports. Its core is modular and based on the widely-used and robust R platform allowing power users to extend its functionalities by submitting their own NR methods. SCENERY is available at scenery.csd.uoc.gr or http://mensxmachina.org/en/software/.}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Papoutsoglou, Georgios and Athineou, Giorgos and Lagani, Vincenzo and Xanthopoulos, Iordanis and Schmidt, Angelika and Éliás, Szabolcs and Tegnér, Jesper and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/2331d9c832468b02236d47e4b563c5738/mensxmachina}, const = {\ text}, doi = {10.1093/nar/gkx448}, interhash = {673f191749dcc1ae18b6e562a41d3176}, intrahash = {331d9c832468b02236d47e4b563c5738}, journal = {Nucleic Acids Research}, keywords = {mxmcausalpath}, month = {July}, pages = {W270-W275}, timestamp = {2021-03-10T09:51:18.000+0100}, title = {SCENERY: a web application for (causal) network reconstruction from cytometry data}, url = {https://doi.org/10.1093/nar/gkx448}, volume = 45, year = 2017 }
- S. Triantafillou, V. Lagani, C. Heinze-Deml, A. Schmidt, J. Tegner, and I. Tsamardinos, "Predicting Causal Relationships from Biological Data: Applying Automated Causal Discovery on Mass Cytometry Data of Human Immune Cells," Nature Scientific Reports, vol. 7, iss. 12724, 2017. doi:10.1038/s41598-017-08582-x
[BibTeX] [Abstract] [Download PDF]
Learning the causal relationships that define a molecular system allows us to predict how the system will respond to different interventions. Distinguishing causality from mere association typically requires randomized experiments. Methods for automated causal discovery from limited experiments exist, but have so far rarely been tested in systems biology applications. In this work, we apply state-of-the art causal discovery methods on a large collection of public mass cytometry data sets, measuring intra-cellular signaling proteins of the human immune system and their response to several perturbations. We show how different experimental conditions can be used to facilitate causal discovery, and apply two fundamental methods that produce context-specific causal predictions. Causal predictions were reproducible across independent data sets from two different studies, but often disagree with the KEGG pathway databases. Within this context, we discuss the caveats we need to overcome for automated causal discovery to become a part of the routine data analysis in systems biology.
@article{Triantafillou2017, abstract = {Learning the causal relationships that define a molecular system allows us to predict how the system will respond to different interventions. Distinguishing causality from mere association typically requires randomized experiments. Methods for automated causal discovery from limited experiments exist, but have so far rarely been tested in systems biology applications. In this work, we apply state-of-the art causal discovery methods on a large collection of public mass cytometry data sets, measuring intra-cellular signaling proteins of the human immune system and their response to several perturbations. We show how different experimental conditions can be used to facilitate causal discovery, and apply two fundamental methods that produce context-specific causal predictions. Causal predictions were reproducible across independent data sets from two different studies, but often disagree with the KEGG pathway databases. Within this context, we discuss the caveats we need to overcome for automated causal discovery to become a part of the routine data analysis in systems biology.}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Triantafillou, Sofia and Lagani, Vincenzo and Heinze-Deml, Christina and Schmidt, Angelika and Tegner, Jesper and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/27862d4ffb2a4ee6dfc296ff06c95d89a/mensxmachina}, const = {\ text}, doi = {10.1038/s41598-017-08582-x}, interhash = {4437d2e18e86d9d0e26274b32df00b99}, intrahash = {7862d4ffb2a4ee6dfc296ff06c95d89a}, journal = {Nature Scientific Reports}, keywords = {mxmcausalpath}, month = {October}, number = 12724, timestamp = {2021-03-10T09:30:37.000+0100}, title = {Predicting Causal Relationships from Biological Data: Applying Automated Causal Discovery on Mass Cytometry Data of Human Immune Cells}, url = {https://www.nature.com/articles/s41598-017-08582-x}, volume = 7, year = 2017 }
- K. Siomos, E. Papadaki, I. Tsamardinos, K. Kerkentzes, M. Koygioylis, and C. Trakatelli, "Prothrombotic and Endothelial Inflammatory Markers in Greek Patients with Type 2 Diabetes Compared to Non-Diabetics," Endocrinology & Metabolic Syndrome, vol. 6, iss. 1, 2017. doi:10.4172/2161-1017.1000259
[BibTeX] [Abstract]
Objective: To evaluate specific factors of coagulation and endothelial inflammatory markers namely, thrombomodulin, soluble receptor of the protein C (sEPCR), factor VIII, plasminogen activator inhibitor 1, Von Willebrandt factor, fibrinogen, fibrinogen dimers (d-dimers), high sensitivity C-reactive protein and homocysteine in a subset of Greek subjects with and without Type 2 (T2) Diabetes. Design: 84 subjects, of which 44 patients with T2 diabetes, were included in the randomized comparative prospective cross sectional study. The subjects were split into a Τ2 diabetics group and a group of healthy controls of similar age, anthropometric profiles and similar gender distribution. Results: A total of 47 variables and biomarkers together with indicators for metabolic profiles, clinical history, as well as detailed anthropometric profiles and traditional risk factors, were evaluated. Dipeptidyl peptidase-4 (DPP4), Insulin, use of Sulfonylurea, high HBA1c and glucose levels, were clearly statistically differentiated in the two groups, while no other biomarkers including the new potential indicators were found to be different. High values of thrombomodulin and homocysteine were correlated with a rise in creatinine and thus seem to affect renal function in the diabetic patients group while in the non-diabetics group the correlations are different with sEPCR having a relative strong negative correlation in renal function as measured with The Modification of Diet in Renal Disease, in agreement with the latest international findings. Conclusions: The presence of T2 diabetes in conjunction with age clearly correlates with problems in renal function, thrombomodulin and homocysteine could serve as indicators for renal damage in diabetics but not in healthy individuals. sEPCR on the other hand could be a potential generic indicator for renal damage. Thrombomodulin and sEPCR as prothombotic agents, did not show any indication that they can be utilised as markers for the prevention and/or treatment of thrombotic complications in diabetic patients.
@article{Siomos2017, abstract = {Objective: To evaluate specific factors of coagulation and endothelial inflammatory markers namely, thrombomodulin, soluble receptor of the protein C (sEPCR), factor VIII, plasminogen activator inhibitor 1, Von Willebrandt factor, fibrinogen, fibrinogen dimers (d-dimers), high sensitivity C-reactive protein and homocysteine in a subset of Greek subjects with and without Type 2 (T2) Diabetes. Design: 84 subjects, of which 44 patients with T2 diabetes, were included in the randomized comparative prospective cross sectional study. The subjects were split into a Τ2 diabetics group and a group of healthy controls of similar age, anthropometric profiles and similar gender distribution. Results: A total of 47 variables and biomarkers together with indicators for metabolic profiles, clinical history, as well as detailed anthropometric profiles and traditional risk factors, were evaluated. Dipeptidyl peptidase-4 (DPP4), Insulin, use of Sulfonylurea, high HBA1c and glucose levels, were clearly statistically differentiated in the two groups, while no other biomarkers including the new potential indicators were found to be different. High values of thrombomodulin and homocysteine were correlated with a rise in creatinine and thus seem to affect renal function in the diabetic patients group while in the non-diabetics group the correlations are different with sEPCR having a relative strong negative correlation in renal function as measured with The Modification of Diet in Renal Disease, in agreement with the latest international findings. Conclusions: The presence of T2 diabetes in conjunction with age clearly correlates with problems in renal function, thrombomodulin and homocysteine could serve as indicators for renal damage in diabetics but not in healthy individuals. sEPCR on the other hand could be a potential generic indicator for renal damage. Thrombomodulin and sEPCR as prothombotic agents, did not show any indication that they can be utilised as markers for the prevention and/or treatment of thrombotic complications in diabetic patients.}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Siomos, Kyros and Papadaki, E and Tsamardinos, Ioannis and Kerkentzes, K and Koygioylis, M and Trakatelli, CM}, biburl = {https://www.bibsonomy.org/bibtex/24417b929f54376f819a75dd83f49e8b0/mensxmachina}, const = {\ text}, doi = {10.4172/2161-1017.1000259}, interhash = {87caff8395d570d0403ceb638ebf3c2f}, intrahash = {4417b929f54376f819a75dd83f49e8b0}, journal = {Endocrinology {\&} Metabolic Syndrome}, keywords = {Diabetes Endothelial Prothombotic Thrombomodulin inflammation markers sEPCR}, month = {February}, number = 1, timestamp = {2019-09-26T17:08:03.000+0200}, title = {Prothrombotic and Endothelial Inflammatory Markers in Greek Patients with Type 2 Diabetes Compared to Non-Diabetics}, volume = 6, year = 2017 }
2016
- G. Athineou, G. Papoutsoglou, S. Triantafullou, I. Basdekis, V. Lagani, and I. Tsamardinos, "SCENERY: a Web-Based Application for Network Reconstruction and Visualization of Cytometry Data.," Accepted for publication on the 10th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2016)., 2016. doi:10.1007/978-3-319-40126-3_21
[BibTeX] [Abstract] [Download PDF]
Cytometry techniques allow to quantify morphological characteristics and protein abundances at a single-cell level. Data collected with these techniques can be used for addressing the fascinating, yet challenging problem of reconstructing the network of protein interactions forming signaling pathways and governing cell biological mechanisms. Network reconstruction is an established and well studied problem in the machine learning and data mining fields, with several algorithms already available. In this paper, we present the first web-oriented application, SCENERY, that allows scientists to rapidly apply state-of-the-art network-reconstruction methods on cytometry data. SCENERY comes with an easy-to-use user interface, a modular architecture, and advanced visualization functions. The functionalities of the application are illustrated on data from a publicly available immunology experiment.
@article{Athineou2016, abstract = {Cytometry techniques allow to quantify morphological characteristics and protein abundances at a single-cell level. Data collected with these techniques can be used for addressing the fascinating, yet challenging problem of reconstructing the network of protein interactions forming signaling pathways and governing cell biological mechanisms. Network reconstruction is an established and well studied problem in the machine learning and data mining fields, with several algorithms already available. In this paper, we present the first web-oriented application, SCENERY, that allows scientists to rapidly apply state-of-the-art network-reconstruction methods on cytometry data. SCENERY comes with an easy-to-use user interface, a modular architecture, and advanced visualization functions. The functionalities of the application are illustrated on data from a publicly available immunology experiment.}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Athineou, G. and Papoutsoglou, G. and Triantafullou, S. and Basdekis, I and Lagani, V. and Tsamardinos, I.}, biburl = {https://www.bibsonomy.org/bibtex/242d7113b974dd4dd85e248f90186fa1a/mensxmachina}, const = {\ text}, doi = {10.1007/978-3-319-40126-3_21}, interhash = {d61d9b74c61b20e751c441e8262a2062}, intrahash = {42d7113b974dd4dd85e248f90186fa1a}, journal = {Accepted for publication on the 10th International Conference on Practical Applications of Computational Biology {\&} Bioinformatics (PACBB 2016).}, keywords = {mxmcausalpath}, timestamp = {2021-03-10T11:24:16.000+0100}, title = {SCENERY: a Web-Based Application for Network Reconstruction and Visualization of Cytometry Data.}, url = {https://link.springer.com/chapter/10.1007/978-3-319-40126-3_21}, year = 2016 }
- P. Charonyktakis, M. Plakia, I. Tsamardinos, and M. Papadopouli, "On user-centric modular QoE prediction for voip based on machine-learning algorithms," IEEE Transactions on Mobile Computing, 2016. doi:10.1109/TMC.2015.2461216
[BibTeX]@article{Charonyktakis2016, added-at = {2018-12-23T19:41:26.000+0100}, author = {Charonyktakis, Paulos and Plakia, Maria and Tsamardinos, Ioannis and Papadopouli, Maria}, biburl = {https://www.bibsonomy.org/bibtex/2fcab8121409028989d4f28a63a4d8921/mensxmachina}, const = {\ text}, doi = {10.1109/TMC.2015.2461216}, interhash = {bea43df893103d16715be91067cd10ff}, intrahash = {fcab8121409028989d4f28a63a4d8921}, journal = {IEEE Transactions on Mobile Computing}, keywords = {mxmcausalpath}, timestamp = {2021-03-10T10:06:16.000+0100}, title = {On user-centric modular QoE prediction for voip based on machine-learning algorithms}, year = 2016 }
- J. Goveia, A. Pircher, L. Conradi, J. Kalucka, V. Lagani, M. Dewerchin, G. Eelen, R. J. DeBerardinis, I. D. Wilson, and P. Carmeliet, "Meta-analysis of clinical metabolic profiling studies in cancer: challenges and opportunities," EMBO Molecular Medicine, 2016. doi:10.15252/EMMM.201606798
[BibTeX]@article{Goveia2016, added-at = {2018-12-23T19:41:26.000+0100}, author = {Goveia, Jermaine and Pircher, Andreas and Conradi, Lena-Christin and Kalucka, Joanna and Lagani, Vincenzo and Dewerchin, Mieke and Eelen, Guy and DeBerardinis, Ralph J and Wilson, Ian D and Carmeliet, Peter}, biburl = {https://www.bibsonomy.org/bibtex/2ebccf46027fb01bbde2b7db18de27a92/mensxmachina}, const = {\ text}, doi = {10.15252/EMMM.201606798}, interhash = {964295a9d8bf4dbde8a7956da25a610a}, intrahash = {ebccf46027fb01bbde2b7db18de27a92}, journal = {EMBO Molecular Medicine}, keywords = {mxmcausalpath}, timestamp = {2021-03-10T10:02:13.000+0100}, title = {Meta-analysis of clinical metabolic profiling studies in cancer: challenges and opportunities}, year = 2016 }
- N. Karathanasis, I. Tsamardinos, and V. Lagani, "omicsNPC: applying the NonParametric Combination methodology to the integrative analysis of heterogeneous omics data," PloS one, 2016. doi:10.1371/journal.pone.0165545
[BibTeX]@article{Karathanasis2016, added-at = {2018-12-23T19:41:26.000+0100}, author = {Karathanasis, N and Tsamardinos, I and Lagani, V}, biburl = {https://www.bibsonomy.org/bibtex/227463febf36217b828cdef0f97cef837/mensxmachina}, const = {\ text}, doi = {10.1371/journal.pone.0165545}, interhash = {2256403d4191925adb5c8dc54d046aa1}, intrahash = {27463febf36217b828cdef0f97cef837}, journal = {PloS one}, keywords = {mxmcausalpath}, timestamp = {2021-03-10T10:01:26.000+0100}, title = {omicsNPC: applying the NonParametric Combination methodology to the integrative analysis of heterogeneous omics data}, year = 2016 }
- V. Lagani, S. Triantafillou, G. Ball, J. Tegner, and I. Tsamardinos, "Probabilistic computational causal discovery for systems biology," Uncertainty in Biology, 2016.
[BibTeX] [Download PDF]@article{Lagani2016a, added-at = {2018-12-23T19:41:26.000+0100}, author = {Lagani, Vincenzo and Triantafillou, Sofia and Ball, Gordon and Tegner, Jesper and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/2de8d28ca65b4b0bc7c101e8089021015/mensxmachina}, const = {\ text}, interhash = {d76d8d1c0d2cda475e012f4ec71e28cc}, intrahash = {de8d28ca65b4b0bc7c101e8089021015}, journal = {Uncertainty in Biology}, keywords = {mxmcausalpath}, timestamp = {2021-03-10T09:59:45.000+0100}, title = {Probabilistic computational causal discovery for systems biology}, url = {http://link.springer.com/chapter/10.1007/978-3-319-21296-8{\_}3}, year = 2016 }
- S. Triantafillou and I. Tsamardinos, "Score based vs constraint based causal learning in the presence of confounders." 2016.
[BibTeX] [Abstract] [Download PDF]
We compare score-based and constraint-based learning in the presence of latent confounders. We use a greedy search strategy to identify the best fitting maximal ancestral graph (MAG) from continuous data, under the assumption of multivariate normality. Scoring maximal ancestral graphs is based on (a) residual iterative conditional fitting [Drton et al., 2009] for obtaining maximum likelihood estimates for the parameters of a given MAG and (b) factorization and score decomposition results for mixed causal graphs [Richardson, 2009, Nowzohour et al., 2015]. We compare the score-based approach in simulated settings with two standard constraintbased algorithms: FCI and conservative FCI. Results show a promising performance of the greedy search algorithm
@inproceedings{Triantafillou2016, abstract = {We compare score-based and constraint-based learning in the presence of latent confounders. We use a greedy search strategy to identify the best fitting maximal ancestral graph (MAG) from continuous data, under the assumption of multivariate normality. Scoring maximal ancestral graphs is based on (a) residual iterative conditional fitting [Drton et al., 2009] for obtaining maximum likelihood estimates for the parameters of a given MAG and (b) factorization and score decomposition results for mixed causal graphs [Richardson, 2009, Nowzohour et al., 2015]. We compare the score-based approach in simulated settings with two standard constraintbased algorithms: FCI and conservative FCI. Results show a promising performance of the greedy search algorithm}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Triantafillou, Sofia and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/22ed5bded5d4fd2cc6a04519d34950c4f/mensxmachina}, const = {\ text}, interhash = {5ec5119e2b38606d25e2412f1b8725ee}, intrahash = {2ed5bded5d4fd2cc6a04519d34950c4f}, keywords = {mxmcausalpath}, timestamp = {2021-03-10T09:47:44.000+0100}, title = {Score based vs constraint based causal learning in the presence of confounders}, url = {http://www.its.caltech.edu/~fehardt/UAI2016WS/papers/Triantafillou.pdf}, year = 2016 }
- G. Borboudakis and I. Tsamardinos, "Towards Robust and Versatile Causal Discovery forBusiness Applications." 2016.
[BibTeX] [Abstract] [Download PDF]
Causal discovery algorithms can induce some of the causal relations from the data, commonly in the form of a causal network such as a causal Bayesian network. Arguably however, all such algorithms lack far behind what is necessary for a true business application. We develop an initial version of a new, general causal discovery algorithm called ETIO with many features suitable for business applications. These include (a) ability to accept prior causal knowledge (e.g., taking senior driving courses improves driving skills), (b) admitting the presence of latent confounding factors, (c) admitting the possibility of (a certain type of) selection bias in the data (e.g., clients sampled mostly from a given region), (d) ability to analyze data with missing-by-design (i.e., not planned to measure) values (e.g., if two companies merge and their databases measure different attributes), and (e) ability to analyze data from different interventions (e.g., prior and posterior to an advertisement campaign). ETIO is an instance of the logical approach to integrative causal discovery that has been relatively recently introduced and enables the solution of complex reverse-engineering problems in causal discovery. ETIO is compared against the state-of-the-art and is shown to be more effective in terms of speed, with only a slight degradation in terms of learning accuracy, while incorporating all the features above.The code is available on the mensxmachina.org website.
@inproceedings{Borboudakis2016, abstract = {Causal discovery algorithms can induce some of the causal relations from the data, commonly in the form of a causal network such as a causal Bayesian network. Arguably however, all such algorithms lack far behind what is necessary for a true business application. We develop an initial version of a new, general causal discovery algorithm called ETIO with many features suitable for business applications. These include (a) ability to accept prior causal knowledge (e.g., taking senior driving courses improves driving skills), (b) admitting the presence of latent confounding factors, (c) admitting the possibility of (a certain type of) selection bias in the data (e.g., clients sampled mostly from a given region), (d) ability to analyze data with missing-by-design (i.e., not planned to measure) values (e.g., if two companies merge and their databases measure different attributes), and (e) ability to analyze data from different interventions (e.g., prior and posterior to an advertisement campaign). ETIO is an instance of the logical approach to integrative causal discovery that has been relatively recently introduced and enables the solution of complex reverse-engineering problems in causal discovery. ETIO is compared against the state-of-the-art and is shown to be more effective in terms of speed, with only a slight degradation in terms of learning accuracy, while incorporating all the features above.The code is available on the mensxmachina.org website.}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Borboudakis, Giorgos and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/2ea490963b24b8468630bd6c9a14043e0/mensxmachina}, const = {\ text}, interhash = {019ba4ae770df56fc07b50d031f8f627}, intrahash = {ea490963b24b8468630bd6c9a14043e0}, keywords = {mxmcausalpath}, timestamp = {2021-03-10T09:47:14.000+0100}, title = {Towards Robust and Versatile Causal Discovery forBusiness Applications}, url = {https://www.kdd.org/kdd2016/papers/files/rpp1045-borboudakisA.pdf}, year = 2016 }
- A. Roumpelaki, G. Borboudakis, S. Triantafillou, and I. Tsamardinos, "Marginal causal consistency in constraint-based causal learning." 2016.
[BibTeX] [Abstract] [Download PDF]
Maximal Ancestral Graphs (MAGs) are probabilistic graphical models that can model the distribution and causal properties of a set of variables in the presence of latent confounders. They are closed under marginalization. Invariant pairwise features of a class of Markov equivalent MAGs can be learnt from observational data sets using the FCI algorithm and its variations (such as conservative FCI and order independent FCI). We investigate the consistency of causal features (causal ancestry relations) obtained by FCI in different marginals of a single data set. In principle, the causal relationships identified by FCI on a data set D measuring a set of variables V should not conflict the output of FCI on marginal data sets including only subsets of V. In practice, however, FCI is prone to error propagation, and running FCI in different marginals results in inconsistent causal predictions. We introduce the term of marginal causal consistency to denote the consistency of causal relationships when learning marginal distributions, and investigate the marginal causal consistency of different FCI variations.Results indicate that marginal causal consistency varies for different algorithms, and is also sensitive to network density and marginal size
@inproceedings{Roumpelaki2016, abstract = {Maximal Ancestral Graphs (MAGs) are probabilistic graphical models that can model the distribution and causal properties of a set of variables in the presence of latent confounders. They are closed under marginalization. Invariant pairwise features of a class of Markov equivalent MAGs can be learnt from observational data sets using the FCI algorithm and its variations (such as conservative FCI and order independent FCI). We investigate the consistency of causal features (causal ancestry relations) obtained by FCI in different marginals of a single data set. In principle, the causal relationships identified by FCI on a data set D measuring a set of variables V should not conflict the output of FCI on marginal data sets including only subsets of V. In practice, however, FCI is prone to error propagation, and running FCI in different marginals results in inconsistent causal predictions. We introduce the term of marginal causal consistency to denote the consistency of causal relationships when learning marginal distributions, and investigate the marginal causal consistency of different FCI variations.Results indicate that marginal causal consistency varies for different algorithms, and is also sensitive to network density and marginal size}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Roumpelaki, Anna and Borboudakis, Giorgos and Triantafillou, Sofia and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/2fa35e747a712b0695a565448282ceb55/mensxmachina}, const = {\ text}, interhash = {dceb33794f0e0261753e329c9dbb7daf}, intrahash = {fa35e747a712b0695a565448282ceb55}, keywords = {mxmcausalpath}, timestamp = {2021-03-10T09:39:45.000+0100}, title = {Marginal causal consistency in constraint-based causal learning}, url = {http://www.its.caltech.edu/~fehardt/UAI2016WS/papers/Roumpelaki.pdf}, year = 2016 }
- V. Lagani, A. D. Karozou, D. Gomez-Cabrero, G. Silberberg, and I. Tsamardinos, "A comparative evaluation of data-merging and meta-analysis methods for reconstructing gene-gene interactions," BMC Bioinformatics, iss. S5, 2016. doi:10.1186/s12859-016-1038-1
[BibTeX] [Download PDF]@article{Lagani2016, added-at = {2018-12-23T19:41:26.000+0100}, author = {Lagani, Vincenzo and Karozou, Argyro D. and Gomez-Cabrero, David and Silberberg, Gilad and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/2ea840a5240ea520417211472b272b7ef/mensxmachina}, const = {\ text}, doi = {10.1186/s12859-016-1038-1}, interhash = {e61c12b1a68f6e300b1e7d54651102b2}, intrahash = {ea840a5240ea520417211472b272b7ef}, journal = {BMC Bioinformatics}, keywords = {mxmcausalpath}, number = {S5}, timestamp = {2021-03-10T09:29:54.000+0100}, title = {A comparative evaluation of data-merging and meta-analysis methods for reconstructing gene-gene interactions}, url = {https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-016-1038-1}, year = 2016 }
- O. D. Roe, M. Markaki, R. Mjelle, Pål. Sætrom, I. Tsamardinos, and V. Lagani, "Serum microRNAs/enriched pathways in lung cancer 1-4 years before diagnosis: A pilot study from the HUNT Biobank, Norway.." 2016.
[BibTeX] [Abstract] [Download PDF]
Background: Early detection of lung cancer could increase survival and curation rate as low stage and surgery are positive prognostic factors. Screening of ever-smokers and asbestos exposed individuals by non-invasive methods is a desired path to increase the rate of early detection and cure in the future. Moreover, circulating early microRNA signature may decipher lung cancer biology. Methods: Serum samples from the HUNT3 Biobank, Levanger, Norway were profiled with separate total microRNA sequencing (Illumina). The samples included lung adenocarcinoma (n=4), squamous cell carcinoma (n=5) and small-cell carcinoma (n=5) cases collected 1-4 years before diagnosis, along with age and sex-matched non-cancer individuals (n=28), ratio 1:2; the never smokers to ever-smokers ratio of controls was 50/50. The differentially expressed (DE) microRNAs were analyzed for enrichment by DIANA-miRPath v3. 0. The target genes of each microRNA included in the signature were determined with the DIANA-microT web server v5.0 and the union of all the microRNA targets was assessed for enrichment in KEGG pathways. Pathways with FDR-corrected p-value < 0.05 are reported in the results. Results: We detected 12 DE microRNAs in the serum 1-4 years in adenocarcioma plus squamous cell carcinoma (NSLC) versus controls. There were 9 DE microRNAs small-cell carcinoma. Several pathways were enriched, including pathways for their respective cancer types (Ex. Table1). Conclusions: In this relatively small but unique pilot study we identified microRNAs that were significantly DE in serum of lung cancer patient 1-4 years prior to diagnosis. These specific microRNAs also target cancer-specific pathways. These are preliminary results whose validation in a larger cohort is still pending. There is hope that significant biomarkers can soon be discovered for early detection of lung cancer within the microRNA family.
@inproceedings{Roe2016, abstract = {Background: Early detection of lung cancer could increase survival and curation rate as low stage and surgery are positive prognostic factors. Screening of ever-smokers and asbestos exposed individuals by non-invasive methods is a desired path to increase the rate of early detection and cure in the future. Moreover, circulating early microRNA signature may decipher lung cancer biology. Methods: Serum samples from the HUNT3 Biobank, Levanger, Norway were profiled with separate total microRNA sequencing (Illumina). The samples included lung adenocarcinoma (n=4), squamous cell carcinoma (n=5) and small-cell carcinoma (n=5) cases collected 1-4 years before diagnosis, along with age and sex-matched non-cancer individuals (n=28), ratio 1:2; the never smokers to ever-smokers ratio of controls was 50/50. The differentially expressed (DE) microRNAs were analyzed for enrichment by DIANA-miRPath v3. 0. The target genes of each microRNA included in the signature were determined with the DIANA-microT web server v5.0 and the union of all the microRNA targets was assessed for enrichment in KEGG pathways. Pathways with FDR-corrected p-value < 0.05 are reported in the results. Results: We detected 12 DE microRNAs in the serum 1-4 years in adenocarcioma plus squamous cell carcinoma (NSLC) versus controls. There were 9 DE microRNAs small-cell carcinoma. Several pathways were enriched, including pathways for their respective cancer types (Ex. Table1). Conclusions: In this relatively small but unique pilot study we identified microRNAs that were significantly DE in serum of lung cancer patient 1-4 years prior to diagnosis. These specific microRNAs also target cancer-specific pathways. These are preliminary results whose validation in a larger cohort is still pending. There is hope that significant biomarkers can soon be discovered for early detection of lung cancer within the microRNA family.}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Roe, D Oluf and Markaki, Maria and Mjelle, Robin and S{\ae}trom, P{\aa}l and Tsamardinos, Ioannis and Lagani, Vincenzo}, biburl = {https://www.bibsonomy.org/bibtex/2099bcbe00e420b5aff734220ab775cea/mensxmachina}, const = {\ text}, interhash = {10f817b55c9b442b534651768e3d6037}, intrahash = {099bcbe00e420b5aff734220ab775cea}, keywords = {imported}, timestamp = {2020-02-05T11:47:34.000+0100}, title = {Serum microRNAs/enriched pathways in lung cancer 1-4 years before diagnosis: A pilot study from the HUNT Biobank, Norway.}, url = {http://abstracts.asco.org/176/AbstView{\_}176{\_}170889.html}, year = 2016 }
- A. I. Robles, K. Standahl Olsen, D. W. Tsui, V. Georgoulias, J. Creaney, K. Dobra, M. Vyberg, N. Minato, R. A. Anders, A. Børresen‑Dale, J. Zhou, Pål. Saetrom, B. Schnack Nielsen, M. B. Kirschner, H. E. Krokan, V. Papadimitrakopoulou, I. Tsamardinos, and O. D. Røe, "Excerpts from the 1st international NTNU symposium on current and future clinical biomarkers of cancer: innovation and implementation, June 16th and 17th 2016, Trondheim, Norway," Journal of Translational Medicine, 2016. doi:10.1186/s12967-016-1059-6
[BibTeX]@article{Robles2016, added-at = {2018-12-23T19:41:26.000+0100}, author = {Robles, Ana I and {Standahl Olsen}, Karina and Tsui, Dana WT and Georgoulias, Vassilis and Creaney, Jenette and Dobra, Katalin and Vyberg, Mogens and Minato, Nagahiro and Anders, Robert A and B{\o}rresen‑Dale, Anne‑Lise and Zhou, Jianwei and Saetrom, P{\aa}l and {Schnack Nielsen}, Boye and Kirschner, Michaela B and Krokan, Hans E and Papadimitrakopoulou, Vassiliki and Tsamardinos, Ioannis and R{\o}e, Oluf D}, biburl = {https://www.bibsonomy.org/bibtex/20505db6f9b7778af28856b37299b951d/mensxmachina}, const = {\ text}, doi = {10.1186/s12967-016-1059-6}, interhash = {98cde881f3992117fba117404c966296}, intrahash = {0505db6f9b7778af28856b37299b951d}, journal = {Journal of Translational Medicine}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Excerpts from the 1st international NTNU symposium on current and future clinical biomarkers of cancer: innovation and implementation, June 16th and 17th 2016, Trondheim, Norway}, year = 2016 }
2015
- S. Triantafillou and I. Tsamardinos, "Constraint-based Causal Discovery from Multiple Interventions over Overlapping Variable Sets," Journal of Machine Learning Research, 2015.
[BibTeX] [Abstract] [Download PDF]
Scientific practice typically involves repeatedly studying a system, each time trying to unravel a different perspective. In each study, the scientist may take measurements under different experimental conditions (interventions, manipulations, perturbations) and measure different sets of quantities (variables). The result is a collection of heterogeneous data sets coming from different data distributions. In this work, we present algorithm COmbINE, which accepts a collection of data sets over overlapping variable sets under different experimental conditions; COmbINE then outputs a summary of all causal models indicating the invariant and variant structural characteristics of all models that simultaneously fit all of the input data sets. COmbINE converts estimated dependencies and independencies in the data into path constraints on the data- generating causal model and encodes them as a SAT instance. The algorithm is sound and complete in the sample limit. To account for conflicting constraints arising from statistical errors, we introduce a general method for sorting constraints in order of confidence, computed as a function of their corresponding p-values. In our empirical evaluation, COmbINE outperforms in terms of efficiency the only pre-existing similar algorithm; the latter additionally admits feedback cycles, but does not admit conflicting constraints which hinders the applicability on real data. As a proof-of-concept, COmbINE is employed to co- analyze 4 real, mass-cytometry data sets measuring phosphorylated protein concentrations of overlapping protein sets under 3 different interventions
@article{Triantafillou2014, abstract = {Scientific practice typically involves repeatedly studying a system, each time trying to unravel a different perspective. In each study, the scientist may take measurements under different experimental conditions (interventions, manipulations, perturbations) and measure different sets of quantities (variables). The result is a collection of heterogeneous data sets coming from different data distributions. In this work, we present algorithm COmbINE, which accepts a collection of data sets over overlapping variable sets under different experimental conditions; COmbINE then outputs a summary of all causal models indicating the invariant and variant structural characteristics of all models that simultaneously fit all of the input data sets. COmbINE converts estimated dependencies and independencies in the data into path constraints on the data- generating causal model and encodes them as a SAT instance. The algorithm is sound and complete in the sample limit. To account for conflicting constraints arising from statistical errors, we introduce a general method for sorting constraints in order of confidence, computed as a function of their corresponding p-values. In our empirical evaluation, COmbINE outperforms in terms of efficiency the only pre-existing similar algorithm; the latter additionally admits feedback cycles, but does not admit conflicting constraints which hinders the applicability on real data. As a proof-of-concept, COmbINE is employed to co- analyze 4 real, mass-cytometry data sets measuring phosphorylated protein concentrations of overlapping protein sets under 3 different interventions}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Triantafillou, Sofia and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/291de28a71f817882d89403435f286cdb/mensxmachina}, const = {\ text}, interhash = {2011d37473e2ce71596a56051c53e4a0}, intrahash = {91de28a71f817882d89403435f286cdb}, journal = {Journal of Machine Learning Research}, keywords = {mxmcausalpath}, timestamp = {2021-03-18T08:39:02.000+0100}, title = {Constraint-based Causal Discovery from Multiple Interventions over Overlapping Variable Sets}, url = {http://arxiv.org/abs/1403.2150}, year = 2015 }
- I. Tsamardinos, M. Tsagris, and V. Lagani, "Feature selection for longitudinal data," Proceedings of the 10th conference of the Hellenic Society for Computational Biology & Bioinformatics (HSCBB15), iss. 1, 2015.
[BibTeX]@article{Tsamardinos2015a, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, Ioannis and Tsagris, Michail and Lagani, Vincenzo}, biburl = {https://www.bibsonomy.org/bibtex/2b4fee8042208f2ff10b9c0feebc0123f/mensxmachina}, const = {\ text}, interhash = {23c07480aea897d3b63d37d2f003e443}, intrahash = {b4fee8042208f2ff10b9c0feebc0123f}, journal = {Proceedings of the 10th conference of the Hellenic Society for Computational Biology {\&} Bioinformatics (HSCBB15)}, keywords = {mxmcausalpath}, number = 1, timestamp = {2021-03-10T09:58:42.000+0100}, title = {Feature selection for longitudinal data}, year = 2015 }
- G. Borboudakis and I. Tsamardinos, "Bayesian Network Learning with Discrete Case-Control Data.," Uncertainty in Artificial Intelligence (UAI), 2015.
[BibTeX] [Abstract] [Download PDF]
We address the problem of learning Bayesian networks from discrete, unmatched case-control data using specialized conditional independence tests. Those tests can also be used for learning other types of graphical models or for feature selection. We also propose a post-processing method that can be applied in conjunction with any Bayesian network learning algorithm. In simulations we show that our methods are able to deal with selection bias from case-control data.
@article{Borboudakis2015, abstract = {We address the problem of learning Bayesian networks from discrete, unmatched case-control data using specialized conditional independence tests. Those tests can also be used for learning other types of graphical models or for feature selection. We also propose a post-processing method that can be applied in conjunction with any Bayesian network learning algorithm. In simulations we show that our methods are able to deal with selection bias from case-control data.}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Borboudakis, Giorgos and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/2b9228070312f82547513ea88f1525650/mensxmachina}, const = {\ text}, interhash = {456f2fd39d47f4dc5fd36f1ca5e44891}, intrahash = {b9228070312f82547513ea88f1525650}, journal = {Uncertainty in Artificial Intelligence (UAI)}, keywords = {mxmcausalpath}, timestamp = {2021-03-10T09:35:58.000+0100}, title = {Bayesian Network Learning with Discrete Case-Control Data.}, url = {http://auai.org/uai2015/proceedings/papers/188.pdf}, year = 2015 }
- A. Alexandridis, G. Borboudakis, and A. Mouchtaris, "Addressing the data-association problem for multiple sound source localization using DOA data estimates," 23rd European Signal Processing Conference (EUSIPCO), 2015. doi:10.1109/EUSIPCO.2015.7362644
[BibTeX] [Abstract] [Download PDF]
In this paper, we consider the data association problem that arises when localizing multiple sound sources using direction of arrival (DOA) estimates from multiple microphone arrays. In such a scenario, the association of the DOAs across the arrays that correspond to the same source is unknown and must be found for accurate localization. We present an association algorithm that finds the correct DOA association to the sources based on features extracted for each source that we propose. Our method results in high association and localization accuracy in scenarios with missed detections, reverberation, and noise and outperforms other recently proposed methods.
@article{Alexandridis2015, abstract = {In this paper, we consider the data association problem that arises when localizing multiple sound sources using direction of arrival (DOA) estimates from multiple microphone arrays. In such a scenario, the association of the DOAs across the arrays that correspond to the same source is unknown and must be found for accurate localization. We present an association algorithm that finds the correct DOA association to the sources based on features extracted for each source that we propose. Our method results in high association and localization accuracy in scenarios with missed detections, reverberation, and noise and outperforms other recently proposed methods.}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Alexandridis, Anastasios and Borboudakis, Giorgos and Mouchtaris, Athanasios}, biburl = {https://www.bibsonomy.org/bibtex/2a07814247fa3c05b693af8fb9d710f8f/mensxmachina}, const = {\ text}, doi = {10.1109/EUSIPCO.2015.7362644}, interhash = {622eccd20d79a3c3256efa953df47acb}, intrahash = {a07814247fa3c05b693af8fb9d710f8f}, journal = {23rd European Signal Processing Conference (EUSIPCO)}, keywords = {imported}, timestamp = {2020-04-15T10:34:32.000+0200}, title = {Addressing the data-association problem for multiple sound source localization using DOA data estimates}, url = {https://ieeexplore.ieee.org/document/7362644}, year = 2015 }
- N. Karathanasis, I. Tsamardinos, and P. Poirazi, "MiRduplexSVM: A high-Performing miRNA-duplex prediction and evaluation methodology," PloS one, iss. 5, 2015. doi:10.1371/journal.pone.0126151
[BibTeX] [Abstract] [Download PDF]
We address the problem of predicting the position of a miRNA duplex on a microRNA hairpin via the development and application of a novel SVM-based methodology. Our method combines a unique problem representation and an unbiased optimization protocol to learn from mirBase19.0 an accurate predictive model, termed MiRduplexSVM. This is the first model that provides precise information about all four ends of the miRNA duplex. We show that (a) our method outperforms four state-of-the-art tools, namely MaturePred, MiRPara, MatureBayes, MiRdup as well as a Simple Geometric Locator when applied on the same training datasets employed for each tool and evaluated on a common blind test set. (b) In all comparisons, MiRduplexSVM shows superior performance, achieving up to a 60% increase in prediction accuracy for mammalian hairpins and can generalize very well on plant hairpins, without any special optimization. (c) The tool has a number of important applications such as the ability to accurately predict the miRNA or the miRNA*, given the opposite strand of a duplex. Its performance on this task is superior to the 2nts overhang rule commonly used in computational studies and similar to that of a comparative genomic approach, without the need for prior knowledge or the complexity of performing multiple alignments. Finally, it is able to evaluate novel, potential miRNAs found either computationally or experimentally. In relation with recent confidence evaluation methods used in miRBase, MiRduplexSVM was successful in identifying high confidence potential miRNAs.
@article{Karathanasis2015, abstract = {We address the problem of predicting the position of a miRNA duplex on a microRNA hairpin via the development and application of a novel SVM-based methodology. Our method combines a unique problem representation and an unbiased optimization protocol to learn from mirBase19.0 an accurate predictive model, termed MiRduplexSVM. This is the first model that provides precise information about all four ends of the miRNA duplex. We show that (a) our method outperforms four state-of-the-art tools, namely MaturePred, MiRPara, MatureBayes, MiRdup as well as a Simple Geometric Locator when applied on the same training datasets employed for each tool and evaluated on a common blind test set. (b) In all comparisons, MiRduplexSVM shows superior performance, achieving up to a 60% increase in prediction accuracy for mammalian hairpins and can generalize very well on plant hairpins, without any special optimization. (c) The tool has a number of important applications such as the ability to accurately predict the miRNA or the miRNA*, given the opposite strand of a duplex. Its performance on this task is superior to the 2nts overhang rule commonly used in computational studies and similar to that of a comparative genomic approach, without the need for prior knowledge or the complexity of performing multiple alignments. Finally, it is able to evaluate novel, potential miRNAs found either computationally or experimentally. In relation with recent confidence evaluation methods used in miRBase, MiRduplexSVM was successful in identifying high confidence potential miRNAs.}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Karathanasis, N and Tsamardinos, I and Poirazi, P}, biburl = {https://www.bibsonomy.org/bibtex/2059b2931d111f346c30207447d631175/mensxmachina}, const = {\ text}, doi = {10.1371/journal.pone.0126151}, interhash = {b4aad6ef0fc292d1003c84450b4b4890}, intrahash = {059b2931d111f346c30207447d631175}, journal = {PloS one}, keywords = {imported}, number = 5, timestamp = {2020-04-15T10:28:17.000+0200}, title = {MiRduplexSVM: A high-Performing miRNA-duplex prediction and evaluation methodology}, url = {http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0126151}, year = 2015 }
- V. Lagani, F. Chiarugi, S. Thomson, J. Fursse, E. Lakasing, R. W. Jones, and I. Tsamardinos, "Development and validation of risk assessment models for diabetes-related complications based on the DCCT/EDIC data," Journal of Diabetes and its Complications 2015., iss. 4, 2015. doi:10.1016/j.jdiacomp.2015.03.001
[BibTeX] [Abstract] [Download PDF]
AIM: To derive and validate a set of computational models able to assess the risk of developing complications and experiencing adverse events for patients with diabetes. The models are developed on data from the Diabetes Control and Complications Trial (DCCT) and the Epidemiology of Diabetes Interventions and Complications (EDIC) studies, and are validated on an external, retrospectively collected cohort. METHODS: We selected fifty-one clinical parameters measured at baseline during the DCCT as potential risk factors for the following adverse outcomes: Cardiovascular Diseases (CVD), Hypoglycemia, Ketoacidosis, Microalbuminuria, Proteinuria, Neuropathy and Retinopathy. For each outcome we applied a data-mining analysis protocol in order to identify the best-performing signature, i.e., the smallest set of clinical parameters that, considered jointly, are maximally predictive for the selected outcome. The predictive models built on the selected signatures underwent both an interval validation on the DCCT/EDIC data and an external validation on a retrospective cohort of 393 diabetes patients (49 Type I and 344 Type II) from the Chorleywood Medical Center, UK. RESULTS: The selected predictive signatures contain five to fifteen risk factors, depending on the specific outcome. Internal validation performances, as measured by the Concordance Index (CI), range from 0.62 to 0.83, indicating good predictive power. The models achieved comparable performances for the Type I and, quite surprisingly, Type II external cohort. CONCLUSIONS: Data-mining analyses of the DCCT/EDIC data allow the identification of accurate predictive models for diabetes-related complications. We also present initial evidences that these models can be applied on a more recent, European population
@article{Lagani2015a, abstract = {AIM: To derive and validate a set of computational models able to assess the risk of developing complications and experiencing adverse events for patients with diabetes. The models are developed on data from the Diabetes Control and Complications Trial (DCCT) and the Epidemiology of Diabetes Interventions and Complications (EDIC) studies, and are validated on an external, retrospectively collected cohort. METHODS: We selected fifty-one clinical parameters measured at baseline during the DCCT as potential risk factors for the following adverse outcomes: Cardiovascular Diseases (CVD), Hypoglycemia, Ketoacidosis, Microalbuminuria, Proteinuria, Neuropathy and Retinopathy. For each outcome we applied a data-mining analysis protocol in order to identify the best-performing signature, i.e., the smallest set of clinical parameters that, considered jointly, are maximally predictive for the selected outcome. The predictive models built on the selected signatures underwent both an interval validation on the DCCT/EDIC data and an external validation on a retrospective cohort of 393 diabetes patients (49 Type I and 344 Type II) from the Chorleywood Medical Center, UK. RESULTS: The selected predictive signatures contain five to fifteen risk factors, depending on the specific outcome. Internal validation performances, as measured by the Concordance Index (CI), range from 0.62 to 0.83, indicating good predictive power. The models achieved comparable performances for the Type I and, quite surprisingly, Type II external cohort. CONCLUSIONS: Data-mining analyses of the DCCT/EDIC data allow the identification of accurate predictive models for diabetes-related complications. We also present initial evidences that these models can be applied on a more recent, European population}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Lagani, Vincenzo and Chiarugi, Franco and Thomson, Shona and Fursse, Jo and Lakasing, Edin and Jones, Russell W. and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/2adfe440f7e1b7ba730729fc0df7738ff/mensxmachina}, const = {\ text}, doi = {10.1016/j.jdiacomp.2015.03.001}, interhash = {cd4b7f972f2fb491eda2ab18637147bf}, intrahash = {adfe440f7e1b7ba730729fc0df7738ff}, journal = {Journal of Diabetes and its Complications 2015.}, keywords = {imported}, number = 4, timestamp = {2020-04-15T10:22:28.000+0200}, title = {Development and validation of risk assessment models for diabetes-related complications based on the DCCT/EDIC data}, url = {http://www.sciencedirect.com/science/article/pii/S1056872715000768}, year = 2015 }
- V. Lagani, F. Chiarugi, D. Manousos, V. Verma, J. Fursse, M. Kostas, and I. Tsamardinos, "Realization of a service for the long-term risk assessment of diabetes-related complications," Journal of Diabetes and its Complications, iss. 5, 2015. doi:10.1016/j.jdiacomp.2015.03.011
[BibTeX] [Abstract] [Download PDF]
AIM: We present a computerized system for the assessment of the long-term risk of developing diabetes-related complications. METHODS: The core of the system consists of a set of predictive models, developed through a data-mining/machine-learning approach, which are able to evaluate individual patient profiles and provide personalized risk assessments. Missing data is a common issue in (electronic) patient records, thus the models are paired with a module for the intelligent management of missing information. RESULTS: The system has been deployed and made publicly available as Web service, and it has been fully integrated within the diabetes-management platform developed by the European project REACTION. Preliminary usability tests showed that the clinicians judged the models useful for risk assessment and for communicating the risk to the patient. Furthermore, the system performs as well as the United Kingdom Prospective Diabetes Study (UKPDS) Risk Engine when both systems are tested on an independent cohort of UK diabetes patients. CONCLUSIONS: Our work provides a working example of risk-stratification tool that is (a) specific for diabetes patients, (b) able to handle several different diabetes related complications, (c) performing as well as the widely known UKPDS Risk Engine on an external validation cohort
@article{Lagani2015, abstract = {AIM: We present a computerized system for the assessment of the long-term risk of developing diabetes-related complications. METHODS: The core of the system consists of a set of predictive models, developed through a data-mining/machine-learning approach, which are able to evaluate individual patient profiles and provide personalized risk assessments. Missing data is a common issue in (electronic) patient records, thus the models are paired with a module for the intelligent management of missing information. RESULTS: The system has been deployed and made publicly available as Web service, and it has been fully integrated within the diabetes-management platform developed by the European project REACTION. Preliminary usability tests showed that the clinicians judged the models useful for risk assessment and for communicating the risk to the patient. Furthermore, the system performs as well as the United Kingdom Prospective Diabetes Study (UKPDS) Risk Engine when both systems are tested on an independent cohort of UK diabetes patients. CONCLUSIONS: Our work provides a working example of risk-stratification tool that is (a) specific for diabetes patients, (b) able to handle several different diabetes related complications, (c) performing as well as the widely known UKPDS Risk Engine on an external validation cohort}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Lagani, Vincenzo and Chiarugi, Franco and Manousos, Dimitris and Verma, Vivek and Fursse, Joanna and Kostas, Marias and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/26a1f5560759c7d75223dc70f0731f8df/mensxmachina}, const = {\ text}, doi = {10.1016/j.jdiacomp.2015.03.011}, interhash = {2311aa7807ae268a224a81782d88bea7}, intrahash = {6a1f5560759c7d75223dc70f0731f8df}, journal = {Journal of Diabetes and its Complications}, keywords = {imported}, number = 5, timestamp = {2020-04-15T10:15:41.000+0200}, title = {Realization of a service for the long-term risk assessment of diabetes-related complications}, url = {http://www.sciencedirect.com/science/article/pii/S1056872715001075}, year = 2015 }
- I. Tsamardinos, A. Rakhshani, and V. Lagani, "Performance-estimation properties of cross-validation based protocols with simultaneous hyper-parameter optimization," International Journal on Artificial Intelligence Tools, 2015. doi:10.1142/S0218213015400230
[BibTeX] [Abstract] [Download PDF]
In a typical supervised data analysis task, one needs to perform the following two tasks: (a) select an optimal combination of learning methods (e.g., for variable selection and classifier) and tune their hyper-parameters (e.g., K in K-NN), also called model selection, and (b) provide an estimate of the performance of the final, reported model. Combining the two tasks is not trivial because when one selects the set of hyper-parameters that seem to provide the best estimated performance, this estimation is optimistic (biased/overfitted) due to performing multiple statistical comparisons. In this paper, we discuss the theoretical properties of performance estimation when model selection is present and we confirm that the simple Cross-Validation with model selection is indeed optimistic (overestimates performance) in small sample scenarios and should be avoided. We present in detail and investigate the theoretical properties of the Nested Cross Validation and a method by Tibshirani and Tibshirani for removing the estimation bias. In computational experiments with real datasets both protocols provide conservative estimation of performance and should be preferred. These statements hold true even if feature selection is performed as preprocessing.
@article{Tsamardinos2015, abstract = {In a typical supervised data analysis task, one needs to perform the following two tasks: (a) select an optimal combination of learning methods (e.g., for variable selection and classifier) and tune their hyper-parameters (e.g., K in K-NN), also called model selection, and (b) provide an estimate of the performance of the final, reported model. Combining the two tasks is not trivial because when one selects the set of hyper-parameters that seem to provide the best estimated performance, this estimation is optimistic (biased/overfitted) due to performing multiple statistical comparisons. In this paper, we discuss the theoretical properties of performance estimation when model selection is present and we confirm that the simple Cross-Validation with model selection is indeed optimistic (overestimates performance) in small sample scenarios and should be avoided. We present in detail and investigate the theoretical properties of the Nested Cross Validation and a method by Tibshirani and Tibshirani for removing the estimation bias. In computational experiments with real datasets both protocols provide conservative estimation of performance and should be preferred. These statements hold true even if feature selection is performed as preprocessing.}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, Ioannis and Rakhshani, Amin and Lagani, Vincenzo}, biburl = {https://www.bibsonomy.org/bibtex/27c86830c58f7710c1082594790eec91f/mensxmachina}, const = {\ text}, doi = {10.1142/S0218213015400230}, interhash = {c3ce54d5e5ba5b97cd10d3208de739b1}, intrahash = {7c86830c58f7710c1082594790eec91f}, journal = {International Journal on Artificial Intelligence Tools}, keywords = {imported}, timestamp = {2020-04-15T10:13:04.000+0200}, title = {Performance-estimation properties of cross-validation based protocols with simultaneous hyper-parameter optimization}, url = {https://www.researchgate.net/publication/284030151_Performance-Estimation_Properties_of_Cross-Validation-Based_Protocols_with_Simultaneous_Hyper-Parameter_Optimization}, year = 2015 }
2014
- C. Papagiannopoulou, G. Tsoumakas, and I. Tsamardinos, "Discovering and Exploiting Entailment Relationships in Multi-Label Learning," ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2015 (KDD), 2014. doi:doi.org/10.1145/2783258.2783302
[BibTeX] [Abstract] [Download PDF]
This work presents a probabilistic method for enforcing adherence of the marginal probabilities of a multi-label model to automatically discovered deterministic relationships among labels. In particular we focus on discovering two kinds of relationships among the labels. The first one concerns pairwise positive entailment: pairs of labels, where the presence of one implies the presence of the other in all instances of a dataset. The second concerns exclusion: sets of labels that do not coexist in the same instances of the dataset. These relationships are represented as a deterministic Bayesian network. Marginal probabilities are entered as soft evidence in the network and through probabilistic inference become consistent with the discovered knowledge. Our approach offers robust improvements in mean average precision compared to the standard binary relevance approach across all 12 datasets involved in our experiments. The discovery process helps interesting implicit knowledge to emerge, which could be useful in itself.
@article{Papagiannopoulou2014, abstract = {This work presents a probabilistic method for enforcing adherence of the marginal probabilities of a multi-label model to automatically discovered deterministic relationships among labels. In particular we focus on discovering two kinds of relationships among the labels. The first one concerns pairwise positive entailment: pairs of labels, where the presence of one implies the presence of the other in all instances of a dataset. The second concerns exclusion: sets of labels that do not coexist in the same instances of the dataset. These relationships are represented as a deterministic Bayesian network. Marginal probabilities are entered as soft evidence in the network and through probabilistic inference become consistent with the discovered knowledge. Our approach offers robust improvements in mean average precision compared to the standard binary relevance approach across all 12 datasets involved in our experiments. The discovery process helps interesting implicit knowledge to emerge, which could be useful in itself.}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Papagiannopoulou, Christina and Tsoumakas, Grigorios and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/2f1f5759a70d82586c239e475d22cb9dc/mensxmachina}, const = {\ text}, doi = {doi.org/10.1145/2783258.2783302}, interhash = {96b6e84d91b5af7efb602eaefe51e5cc}, intrahash = {f1f5759a70d82586c239e475d22cb9dc}, journal = {ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2015 (KDD)}, keywords = {mxmcausalpath}, timestamp = {2021-03-10T11:23:30.000+0100}, title = {Discovering and Exploiting Entailment Relationships in Multi-Label Learning}, url = {http://arxiv.org/abs/1404.4038}, year = 2014 }
- N. Karathanasis, I. Tsamardinos, and P. Poirazi, "Don't use a cannon to kill the miRNA mosquito," Bioinformatics, iss. 7, 2014. doi:10.1093/bioinformatics/btu100
[BibTeX] [Download PDF]@article{Karathanasis2014, added-at = {2018-12-23T19:41:26.000+0100}, author = {Karathanasis, N and Tsamardinos, I and Poirazi, P}, biburl = {https://www.bibsonomy.org/bibtex/2620f0e016f9fe71f2c12fc76e78d35cb/mensxmachina}, const = {\ text}, doi = {10.1093/bioinformatics/btu100}, interhash = {12404079301a6c3eace29c13f703f65a}, intrahash = {620f0e016f9fe71f2c12fc76e78d35cb}, journal = {Bioinformatics}, keywords = {imported}, number = 7, timestamp = {2020-04-15T11:12:25.000+0200}, title = {Don't use a cannon to kill the miRNA mosquito}, url = {https://bioinformatics.oxfordjournals.org/content/early/2014/03/12/bioinformatics.btu100.full}, year = 2014 }
- K. Kerkentzes, V. Lagani, I. Tsamardinos, M. Vyberg, and O. Røe, "Hidden treasures in “ancient” microarrays: gene-expression portrays biology and potential resistance pathways of major lung cancer subtypes and normal," Frontiers 2014, iss. 251, 2014. doi:10.3389/fonc.2014.00251
[BibTeX] [Abstract] [Download PDF]
OBJECTIVE: Novel statistical methods and increasingly more accurate gene annotations can transform "old" biological data into a renewed source of knowledge with potential clinical relevance. Here, we provide an in silico proof-of-concept by extracting novel information from a high-quality mRNA expression dataset, originally published in 2001, using state-of-the-art bioinformatics approaches. METHODS: The dataset consists of histologically defined cases of lung adenocarcinoma (AD), squamous (SQ) cell carcinoma, small-cell lung cancer, carcinoid, metastasis (breast and colon AD), and normal lung specimens (203 samples in total). A battery of statistical tests was used for identifying differential gene expressions, diagnostic and prognostic genes, enriched gene ontologies, and signaling pathways. RESULTS: Our results showed that gene expressions faithfully recapitulate immunohistochemical subtype markers, as chromogranin A in carcinoids, cytokeratin 5, p63 in SQ, and TTF1 in non-squamous types. Moreover, biological information with putative clinical relevance was revealed as potentially novel diagnostic genes for each subtype with specificity 93-100% (AUC = 0.93-1.00). Cancer subtypes were characterized by (a) differential expression of treatment target genes as TYMS, HER2, and HER3 and (b) overrepresentation of treatment-related pathways like cell cycle, DNA repair, and ERBB pathways. The vascular smooth muscle contraction, leukocyte trans-endothelial migration, and actin cytoskeleton pathways were overexpressed in normal tissue. CONCLUSION: Reanalysis of this public dataset displayed the known biological features of lung cancer subtypes and revealed novel pathways of potentially clinical importance. The findings also support our hypothesis that even old omics data of high quality can be a source of significant biological information when appropriate bioinformatics methods are used.
@article{Kerkentzes2014, abstract = {OBJECTIVE: Novel statistical methods and increasingly more accurate gene annotations can transform "old" biological data into a renewed source of knowledge with potential clinical relevance. Here, we provide an in silico proof-of-concept by extracting novel information from a high-quality mRNA expression dataset, originally published in 2001, using state-of-the-art bioinformatics approaches. METHODS: The dataset consists of histologically defined cases of lung adenocarcinoma (AD), squamous (SQ) cell carcinoma, small-cell lung cancer, carcinoid, metastasis (breast and colon AD), and normal lung specimens (203 samples in total). A battery of statistical tests was used for identifying differential gene expressions, diagnostic and prognostic genes, enriched gene ontologies, and signaling pathways. RESULTS: Our results showed that gene expressions faithfully recapitulate immunohistochemical subtype markers, as chromogranin A in carcinoids, cytokeratin 5, p63 in SQ, and TTF1 in non-squamous types. Moreover, biological information with putative clinical relevance was revealed as potentially novel diagnostic genes for each subtype with specificity 93-100% (AUC = 0.93-1.00). Cancer subtypes were characterized by (a) differential expression of treatment target genes as TYMS, HER2, and HER3 and (b) overrepresentation of treatment-related pathways like cell cycle, DNA repair, and ERBB pathways. The vascular smooth muscle contraction, leukocyte trans-endothelial migration, and actin cytoskeleton pathways were overexpressed in normal tissue. CONCLUSION: Reanalysis of this public dataset displayed the known biological features of lung cancer subtypes and revealed novel pathways of potentially clinical importance. The findings also support our hypothesis that even old omics data of high quality can be a source of significant biological information when appropriate bioinformatics methods are used.}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Kerkentzes, K and Lagani, V and Tsamardinos, I and Vyberg, M and R{\o}e, OD}, biburl = {https://www.bibsonomy.org/bibtex/2558b2afd4c024bb7acf7c58ca876f720/mensxmachina}, const = {\ text}, doi = {10.3389/fonc.2014.00251}, interhash = {a7da10e8266fb325ebdaa923449dbe07}, intrahash = {558b2afd4c024bb7acf7c58ca876f720}, journal = {Frontiers 2014}, keywords = {imported}, number = 251, timestamp = {2020-04-15T11:10:37.000+0200}, title = {Hidden treasures in “ancient” microarrays: gene-expression portrays biology and potential resistance pathways of major lung cancer subtypes and normal}, url = {http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4178426/}, year = 2014 }
- S. Triantafilou, I. Tsamardinos, and A. Roumpelaki, "Learning Neighborhoods of High Confidence in Constraint-Based Causal Discovery.," Springer 2014, 2014. doi:doi.org/10.1007/978-3-319-11433-0_32
[BibTeX] [Abstract] [Download PDF]
Constraint-based causal discovery algorithms use conditional independence tests to identify the skeleton and invariant orientations of a causal network. Two major disadvantages of constraint-based methods are that (a) they are sensitive to error propagation and (b) the results of the conditional independence tests are binarized by being compared to a hard threshold; thus, the resulting networks are not easily evaluated in terms of reliability. We present PROPeR, a method for estimating posterior probabilities of pairwise relations (adjacencies and non-adjacencies) of a network skeleton as a function of the corresponding p-values. This novel approach has no significant computational overhead and can scale up to the same number of variables as the constraint-based algorithm of choice. We also present BiND, an algorithm that identifies neighborhoods of high structural confidence on causal networks learnt with constraint-based algorithms. The algorithm uses PROPeR; to estimate the confidence of all pairwise relations. Maximal neighborhoods of the skeleton with minimum confidence above a user-defined threshold are then identified using the Bron-Kerbosch algorithm for identifying maximal cliques. In our empirical evaluation, we demonstrate that (a) the posterior probability estimates for pairwise relations are reasonable and comparable with estimates obtained using more expensive Bayesian methods and (b) BiND; identifies sub-networks with higher structural precision and recall than the output of the constraint-based algorithm.
@article{Triantafilou2014, abstract = {Constraint-based causal discovery algorithms use conditional independence tests to identify the skeleton and invariant orientations of a causal network. Two major disadvantages of constraint-based methods are that (a) they are sensitive to error propagation and (b) the results of the conditional independence tests are binarized by being compared to a hard threshold; thus, the resulting networks are not easily evaluated in terms of reliability. We present PROPeR, a method for estimating posterior probabilities of pairwise relations (adjacencies and non-adjacencies) of a network skeleton as a function of the corresponding p-values. This novel approach has no significant computational overhead and can scale up to the same number of variables as the constraint-based algorithm of choice. We also present BiND, an algorithm that identifies neighborhoods of high structural confidence on causal networks learnt with constraint-based algorithms. The algorithm uses PROPeR; to estimate the confidence of all pairwise relations. Maximal neighborhoods of the skeleton with minimum confidence above a user-defined threshold are then identified using the Bron-Kerbosch algorithm for identifying maximal cliques. In our empirical evaluation, we demonstrate that (a) the posterior probability estimates for pairwise relations are reasonable and comparable with estimates obtained using more expensive Bayesian methods and (b) BiND; identifies sub-networks with higher structural precision and recall than the output of the constraint-based algorithm.}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Triantafilou, S and Tsamardinos, I and Roumpelaki, A}, biburl = {https://www.bibsonomy.org/bibtex/243fdaaa2e9f772d8118c7dde2a646d78/mensxmachina}, const = {\ text}, doi = {doi.org/10.1007/978-3-319-11433-0_32}, interhash = {7c70b36fba1e7e1362e070bd8079a036}, intrahash = {43fdaaa2e9f772d8118c7dde2a646d78}, journal = {Springer 2014}, keywords = {imported}, timestamp = {2020-04-15T10:39:11.000+0200}, title = {Learning Neighborhoods of High Confidence in Constraint-Based Causal Discovery.}, url = {http://link.springer.com/content/pdf/10.1007/978-3-319-11433-0.pdf{\#}page=498}, year = 2014 }
- I. Tsamardinos, A. Rakhshani, and V. Lagani, "Performance-estimation properties of cross-validation-based protocols with simultaneous hyper-parameter optimization," 8th Hellenic Conference on Artificial Intelligence (SETN 2014)., 2014. doi:10.1007/978-3-319-07064-3_1
[BibTeX] [Abstract] [Download PDF]
In a typical supervised data analysis task, one needs to perform the following two tasks: (a) select the best combination of learning methods (e.g., for variable selection and classifier) and tune their hyper-parameters (e.g., K in K-NN), also called model selection, and (b) provide an estimate of the perfor-mance of the final, reported model. Combining the two tasks is not trivial be-cause when one selects the set of hyper-parameters that seem to provide the best estimated performance, this estimation is optimistic (biased / overfitted) due to performing multiple statistical comparisons. In this paper, we confirm that the simple Cross-Validation with model selection is indeed optimistic (overesti-mates) in small sample scenarios. In comparison the Nested Cross Validation and the method by Tibshirani and Tibshirani provide conservative estimations, with the later protocol being more computationally efficient. The role of strati-fication of samples is examined and it is shown that stratification is beneficial.
@article{Tsamardinos2014, abstract = {In a typical supervised data analysis task, one needs to perform the following two tasks: (a) select the best combination of learning methods (e.g., for variable selection and classifier) and tune their hyper-parameters (e.g., K in K-NN), also called model selection, and (b) provide an estimate of the perfor-mance of the final, reported model. Combining the two tasks is not trivial be-cause when one selects the set of hyper-parameters that seem to provide the best estimated performance, this estimation is optimistic (biased / overfitted) due to performing multiple statistical comparisons. In this paper, we confirm that the simple Cross-Validation with model selection is indeed optimistic (overesti-mates) in small sample scenarios. In comparison the Nested Cross Validation and the method by Tibshirani and Tibshirani provide conservative estimations, with the later protocol being more computationally efficient. The role of strati-fication of samples is examined and it is shown that stratification is beneficial.}, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, I and Rakhshani, A and Lagani, V}, biburl = {https://www.bibsonomy.org/bibtex/279946b13bb69eb7e4060e7d110e8fb71/mensxmachina}, const = {\ text}, doi = {10.1007/978-3-319-07064-3_1}, interhash = {0c654f88a674bea8c18a7e76c9cb8394}, intrahash = {79946b13bb69eb7e4060e7d110e8fb71}, journal = {8th Hellenic Conference on Artificial Intelligence (SETN 2014).}, keywords = {imported}, timestamp = {2020-04-15T10:36:30.000+0200}, title = {Performance-estimation properties of cross-validation-based protocols with simultaneous hyper-parameter optimization}, url = {http://link.springer.com/chapter/10.1007/978-3-319-07064-3{\_}1}, year = 2014 }
- G. T. Huang, I. Tsamardinos, V. Raghu, N. Kaminski, and P. V. Benos, "T-ReCS: stable selection of dynamically formed groups of features with application to prediction of clinical outcomes.," Pacific Symposium on Biocomputing (PSB), 2014.
[BibTeX] [Download PDF]@article{Huang2014, added-at = {2018-12-23T19:41:26.000+0100}, author = {Huang, Grace T. and Tsamardinos, Ioannis and Raghu, Vineet and Kaminski, Naftali and Benos, Panayiotis V.}, biburl = {https://www.bibsonomy.org/bibtex/292f03575ac8cd7eb5dc2a752fefddde4/mensxmachina}, const = {\ text}, interhash = {3c81e3c323e325c5d1e30d66ae6d2043}, intrahash = {92f03575ac8cd7eb5dc2a752fefddde4}, journal = {Pacific Symposium on Biocomputing (PSB)}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {T-ReCS: stable selection of dynamically formed groups of features with application to prediction of clinical outcomes.}, url = {http://europepmc.org/articles/pmc4299881}, year = 2014 }
2013
- G. L. Papadopoulos, E. Karkoulia, I. Tsamardinos, C. Porcher, J. Ragoussis, J. Bungert, and J. Strouboulis, "GATA-1 genome-wide occupancy associates with distinct epigenetic profiles in mouse fetal liver erythropoiesis," Nucl. Acids Res, iss. 9, 2013.
[BibTeX] [Download PDF]@article{Papadopoulos2013, added-at = {2018-12-23T19:41:26.000+0100}, author = {Papadopoulos, Giorgio L and Karkoulia, Elena and Tsamardinos, Ioannis and Porcher, Catherine and Ragoussis, Jiannis and Bungert, Jorg and Strouboulis, John}, biburl = {https://www.bibsonomy.org/bibtex/20afcf462084517381f91e6bde355718e/mensxmachina}, const = {\ text}, interhash = {21e1c87c70d7a2e78cf318352a180cc7}, intrahash = {0afcf462084517381f91e6bde355718e}, journal = {Nucl. Acids Res}, keywords = {imported}, number = 9, timestamp = {2018-12-23T19:41:26.000+0100}, title = {GATA-1 genome-wide occupancy associates with distinct epigenetic profiles in mouse fetal liver erythropoiesis}, url = {http://nar.oxfordjournals.org/content/early/2013/03/20/nar.gkt167.short}, year = 2013 }
- V. Lagani, L. Koumakis, F. Chiarugi, E. Lakasing, and I. Tsamardinos, "A systematic review of predictive risk models for diabetes complications based on large scale clinical studies," Journal of Diabetes and Its Complications, iss. 4, 2013.
[BibTeX] [Download PDF]@article{Lagani2013, added-at = {2018-12-23T19:41:26.000+0100}, author = {Lagani, Vincenzo and Koumakis, Lefteris and Chiarugi, Franco and Lakasing, Edin and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/252188476a8bf580f6cd9679758d3cd3c/mensxmachina}, const = {\ text}, interhash = {0fee57926890bb1d3be0135165a62777}, intrahash = {52188476a8bf580f6cd9679758d3cd3c}, journal = {Journal of Diabetes and Its Complications}, keywords = {imported}, number = 4, timestamp = {2018-12-23T19:41:26.000+0100}, title = {A systematic review of predictive risk models for diabetes complications based on large scale clinical studies}, url = {http://www.sciencedirect.com/science/article/pii/S1056872712003303}, year = 2013 }
- V. Lagani, G. Kortas, and I. Tsamardinos, "Biomarker signature identification in “omics” data with multi-class outcome," Computational and Structural Biotechnology Journal, 2013, 6(7), iss. 7, 2013.
[BibTeX] [Download PDF]@article{Lagani2013a, added-at = {2018-12-23T19:41:26.000+0100}, author = {Lagani, V and Kortas, G and Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/275d0ca9c85be8474cebb5e3a5ea947b2/mensxmachina}, const = {\ text}, interhash = {f59da4d0345cfb95eee8059bb0c48eec}, intrahash = {75d0ca9c85be8474cebb5e3a5ea947b2}, journal = {Computational and Structural Biotechnology Journal, 2013, 6(7)}, keywords = {imported}, number = 7, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Biomarker signature identification in “omics” data with multi-class outcome}, url = {http://www.sciencedirect.com/science/article/pii/S2001037014601136}, year = 2013 }
- N. Karathanasis, I. Tsamardinos, and P. Poirazi, "A bioinformatics approach for investigating the determinants of Drosha processing," 13th IEEE International Conference on Bioinformatics and Bioengineering (IEEE BIBE 2013), 2013.
[BibTeX] [Download PDF]@article{Karathanasis2013, added-at = {2018-12-23T19:41:26.000+0100}, author = {Karathanasis, N. and Tsamardinos, I. and Poirazi, P}, biburl = {https://www.bibsonomy.org/bibtex/2b2e5e061fcc1711dd2eac0ce71db5529/mensxmachina}, const = {\ text}, interhash = {881890a1743b798cd2094e07496ae07a}, intrahash = {b2e5e061fcc1711dd2eac0ce71db5529}, journal = {13th IEEE International Conference on Bioinformatics and Bioengineering (IEEE BIBE 2013)}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {A bioinformatics approach for investigating the determinants of Drosha processing}, url = {http://ieeexplore.ieee.org/xpls/abs{\_}all.jsp?arnumber=6701569}, year = 2013 }
- I. Karakasilioti, I. Kamileri, G. Chatzinikolaou, T. Kosteas, E. Vergadi, A. R. Robinson, I. Tsamardinos, T. A. Rozgaja, S. Siakouli, C. Tsatsanis, L. J. Niedernhofer, and G. A. Garinis, "DNA damage triggers a chronic autoinflammatory response, leading to fat depletion in NER progeria," Cell Metabolism, iss. 3, 2013.
[BibTeX] [Download PDF]@article{Karakasilioti2013, added-at = {2018-12-23T19:41:26.000+0100}, author = {Karakasilioti, I and Kamileri, I and Chatzinikolaou, G. and Kosteas, T. and Vergadi, E. and Robinson, A. R. and Tsamardinos, I. and Rozgaja, T. A. and Siakouli, S. and Tsatsanis, C. and Niedernhofer, L. J. and Garinis, G. A.}, biburl = {https://www.bibsonomy.org/bibtex/25bb14e914a70c09a8e984f32c6f7f687/mensxmachina}, const = {\ text}, interhash = {92fd18e183e51fc4e4fccb96a3d031c1}, intrahash = {5bb14e914a70c09a8e984f32c6f7f687}, journal = {Cell Metabolism}, keywords = {imported}, number = 3, timestamp = {2018-12-23T19:41:26.000+0100}, title = {DNA damage triggers a chronic autoinflammatory response, leading to fat depletion in NER progeria}, url = {http://www.sciencedirect.com/science/article/pii/S1550413113003379}, year = 2013 }
- P. Hunter, T. Chapman, P. Coveney, B. de Bono, V. Diaz, J. Fenner, A. Frangi, P. Harris, R. Hose, P. Kohl, P. Lawford, K. McCormack, M. Mendes, S. Omholt, A. Quarteroni, N. Shublaq, J. Skår, K. Stroetmann, J. Tegner, S. Thomas, I. Tollis, I. Tsamardinos, J. van Beek, and M. Viceconti, "A vision and strategy for the virtual physiological human: 2012 update," Interface Focus, iss. 2, 2013.
[BibTeX] [Download PDF]@article{Hunter2013, added-at = {2018-12-23T19:41:26.000+0100}, author = {Hunter, P. and Chapman, T and Coveney, PV and de Bono, B and Diaz, V and Fenner, J and Frangi, AF and Harris, P and Hose, R and Kohl, P and Lawford, P and McCormack, K and Mendes, M and Omholt, S and Quarteroni, A and Shublaq, N and Sk{\aa}r, J and Stroetmann, K and Tegner, J and Thomas, SR and Tollis, I and Tsamardinos, I and van Beek, JHGM and Viceconti, M.}, biburl = {https://www.bibsonomy.org/bibtex/2c07ab67d81b2f90413ed9d43a72513aa/mensxmachina}, const = {\ text}, interhash = {fbdda298ac0cacb556e0d42fe4729adf}, intrahash = {c07ab67d81b2f90413ed9d43a72513aa}, journal = {Interface Focus}, keywords = {imported}, number = 2, timestamp = {2018-12-23T19:41:26.000+0100}, title = {A vision and strategy for the virtual physiological human: 2012 update}, url = {http://rsfs.royalsocietypublishing.org/content/3/2/20130004.short}, year = 2013 }
2012
- I. Tsamardinos, S. Triantafillou, and V. Lagani, "Towards integrative causal analysis of heterogeneous data sets and studies," Journal of Machine Learning Research, iss. 1, 2012.
[BibTeX] [Download PDF]@article{Tsamardinos2012b, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, Ioannis and Triantafillou, Sofia and Lagani, Vincenzo}, biburl = {https://www.bibsonomy.org/bibtex/2fb92519060ffc9129cd2bef7ea194602/mensxmachina}, const = {\ text}, interhash = {c123d94724c777656772bfe6b41db534}, intrahash = {fb92519060ffc9129cd2bef7ea194602}, journal = {Journal of Machine Learning Research}, keywords = {imported}, number = 1, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Towards integrative causal analysis of heterogeneous data sets and studies}, url = {http://dl.acm.org/citation.cfm?id=2343683}, year = 2012 }
- I. Tsamardinos, V. Lagani, and D. Pappas, "Discovering multiple, equivalent biomarker signatures.," proceedings of the 7th conference of the Hellenic Society for Computational Biology & Bioinformatics, 2012.
[BibTeX]@article{Tsamardinos2012, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, I. and Lagani, V. and Pappas, D.}, biburl = {https://www.bibsonomy.org/bibtex/2edd82031fcb152af82018a6b3f3dfa09/mensxmachina}, const = {\ text}, interhash = {8e276a79a2ea9bec41b4faffb7f0aab6}, intrahash = {edd82031fcb152af82018a6b3f3dfa09}, journal = {proceedings of the 7th conference of the Hellenic Society for Computational Biology {\&} Bioinformatics}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Discovering multiple, equivalent biomarker signatures.}, year = 2012 }
- I. Tsamardinos, G. Borboudakis, E. Christodoulou, and O. D. Røe, "Chemosensitivity Prediction of Tumours Based on Expression, miRNA, and Proteomics Data," International Journal of Systems Biology and Biomedical Technologies (IJSBBT), iss. 2, 2012.
[BibTeX] [Download PDF]@article{Tsamardinos2012a, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, I. and Borboudakis, G. and Christodoulou, E. and R{\o}e, O. D.}, biburl = {https://www.bibsonomy.org/bibtex/271a68b49fd3653307a170d582c95acde/mensxmachina}, const = {\ text}, interhash = {56c4dc9a65e082ce1b2fc8a292c4968a}, intrahash = {71a68b49fd3653307a170d582c95acde}, journal = {International Journal of Systems Biology and Biomedical Technologies (IJSBBT)}, keywords = {imported}, number = 2, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Chemosensitivity Prediction of Tumours Based on Expression, miRNA, and Proteomics Data}, url = {https://www.researchgate.net/profile/Oluf{\_}Roe2/publication/235343751{\_}Chemosensitivity{\_}Prediction{\_}of{\_}Tumours{\_}Based{\_}on{\_}Expression{\_}miRNA{\_}and{\_}Proteomics{\_}Data/links/09e415111773bc158c000000.pdf}, year = 2012 }
- V. Lagani, I. Tsamardinos, and S. Triantafillou, "Learning from Mixture of Experimental Data: A Constraint–Based Approach," Artificial Intelligence: Theories and Applications: 7th Hellenic Conference on AI, 2012.
[BibTeX] [Download PDF]@article{Lagani2012, added-at = {2018-12-23T19:41:26.000+0100}, author = {Lagani, V and Tsamardinos, I and Triantafillou, S}, biburl = {https://www.bibsonomy.org/bibtex/2807d52caa13537638d18e6705d6dfb11/mensxmachina}, const = {\ text}, interhash = {def0841ccbf004f960ba0cd437cc754e}, intrahash = {807d52caa13537638d18e6705d6dfb11}, journal = {Artificial Intelligence: Theories and Applications: 7th Hellenic Conference on AI}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Learning from Mixture of Experimental Data: A Constraint–Based Approach}, url = {http://link.springer.com/chapter/10.1007/978-3-642-30448-4{\_}16}, year = 2012 }
- S. Kleisarchaki, D. Kotzinos, I. Tsamardinos, and V. Christophides, "A Methodological Framework for Statistical Analysis of Social Text Streams," International Workshop on Information Search, Integration and Personalization (ISIP 2012), 2012.
[BibTeX] [Download PDF]@article{Kleisarchaki2012, added-at = {2018-12-23T19:41:26.000+0100}, author = {Kleisarchaki, S. and Kotzinos, D. and Tsamardinos, I and Christophides, V}, biburl = {https://www.bibsonomy.org/bibtex/2bce8fc07686901084e593a8f064154fd/mensxmachina}, const = {\ text}, interhash = {1a17bd8bb2a9b29337e1490344df8953}, intrahash = {bce8fc07686901084e593a8f064154fd}, journal = {International Workshop on Information Search, Integration and Personalization (ISIP 2012)}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {A Methodological Framework for Statistical Analysis of Social Text Streams}, url = {http://link.springer.com/chapter/10.1007/978-3-642-40140-4{\_}11}, year = 2012 }
- S. Dato, M. Soerensen, A. Montesanto, V. Lagani, G. Passarino, K. Christensen, and L. Christiansen, "UCP3 polymorphisms, hand grip performance and survival at old age: Association analysis in two Danish middle aged and elderly cohorts," Joint Meeting AGI-SIBV-SIGA, iss. 8, 2012.
[BibTeX] [Download PDF]@article{Dato2012, added-at = {2018-12-23T19:41:26.000+0100}, author = {Dato, S. and Soerensen, M. and Montesanto, A. and Lagani, V. and Passarino, G. and Christensen, K. and Christiansen, L.}, biburl = {https://www.bibsonomy.org/bibtex/255398239685068847bc5ac6ff7148306/mensxmachina}, const = {\ text}, interhash = {ff37273e55424a104db0df30fc5467c9}, intrahash = {55398239685068847bc5ac6ff7148306}, journal = {Joint Meeting AGI-SIBV-SIGA}, keywords = {imported}, number = 8, timestamp = {2018-12-23T19:41:26.000+0100}, title = {UCP3 polymorphisms, hand grip performance and survival at old age: Association analysis in two Danish middle aged and elderly cohorts}, url = {http://www.sciencedirect.com/science/article/pii/S0047637412001054}, year = 2012 }
- S. Dato, A. Montesanto, V. Lagani, B. Jeune, K. Christensen, and G. Passarino, "Frailty phenotypes in the elderly based on cluster analysis: a longitudinal study of two Danish cohorts. Evidence for a genetic influence on frailty," Age, iss. 3, 2012.
[BibTeX] [Download PDF]@article{Dato2012a, added-at = {2018-12-23T19:41:26.000+0100}, author = {Dato, S. and Montesanto, A. and Lagani, V. and Jeune, B. and Christensen, K. and Passarino, G.}, biburl = {https://www.bibsonomy.org/bibtex/24afaf62b1ec583dcc29ca8e1d7937218/mensxmachina}, const = {\ text}, interhash = {4f24543f88583093f2344473005c0e13}, intrahash = {4afaf62b1ec583dcc29ca8e1d7937218}, journal = {Age}, keywords = {imported}, number = 3, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Frailty phenotypes in the elderly based on cluster analysis: a longitudinal study of two Danish cohorts. Evidence for a genetic influence on frailty}, url = {http://link.springer.com/article/10.1007/s11357-011-9257-x}, year = 2012 }
- T. I. Brown L.E. and D. Hardin, "To feature space and back: Identifying top-weighted features in polynomial support vector machine models," Intelligent Data Analysis, iss. 4, 2012.
[BibTeX] [Download PDF]@article{BrownL.E.TsamardinosI.&Hardin2012, added-at = {2018-12-23T19:41:26.000+0100}, author = {{Brown, L.E., Tsamardinos, I., {\&} Hardin}, D}, biburl = {https://www.bibsonomy.org/bibtex/204fddceaf93826049fdad6bbc8eefa3c/mensxmachina}, const = {\ text}, interhash = {a8a0066f9f31f2db39c69d0963ac51dd}, intrahash = {04fddceaf93826049fdad6bbc8eefa3c}, journal = {Intelligent Data Analysis}, keywords = {imported}, number = 4, timestamp = {2018-12-23T19:41:26.000+0100}, title = {To feature space and back: Identifying top-weighted features in polynomial support vector machine models}, url = {http://content.iospress.com/articles/intelligent-data-analysis/ida00539}, year = 2012 }
- G. Borboudakis, S. Triantafillou, and I. Tsamardinos, "Tools and algorithms for causally interpreting directed edges in maximal ancestral graphs," Sixth European Workshop on Probabilistic Graphical Models, (PGM 2012), 2012.
[BibTeX] [Download PDF]@article{Borboudakis2012, added-at = {2018-12-23T19:41:26.000+0100}, author = {Borboudakis, G. and Triantafillou, S. and Tsamardinos, I.}, biburl = {https://www.bibsonomy.org/bibtex/2516c02d6bc46c1e753eb4d6e27dc53df/mensxmachina}, const = {\ text}, interhash = {9a1631fcbd398a8d1143fddd3f5d898d}, intrahash = {516c02d6bc46c1e753eb4d6e27dc53df}, journal = {Sixth European Workshop on Probabilistic Graphical Models, (PGM 2012)}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Tools and algorithms for causally interpreting directed edges in maximal ancestral graphs}, url = {https://www.researchgate.net/profile/Ioannis{\_}Tsamardinos/publication/265159304{\_}Tools{\_}and{\_}Algorithms{\_}for{\_}Causally{\_}Interpreting{\_}Directed{\_}Edges{\_}in{\_}Maximal{\_}Ancestral{\_}Graphs/links/54043b930cf23d9765a5fe81.pdf}, year = 2012 }
- G. Borboudakis and I. Tsamardinos, "Scoring and searching over Bayesian networks with causal and associative priors," Uncertainty in Artificial Intelligence (UAI), 2012.
[BibTeX] [Download PDF]@article{Borboudakis2012b, added-at = {2018-12-23T19:41:26.000+0100}, author = {Borboudakis, G and Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/204b6111522dd0883433ddad64cea1b38/mensxmachina}, const = {\ text}, interhash = {8e653b8d7bbcb82047254e12e2d94790}, intrahash = {04b6111522dd0883433ddad64cea1b38}, journal = {Uncertainty in Artificial Intelligence (UAI)}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Scoring and searching over Bayesian networks with causal and associative priors}, url = {http://arxiv.org/abs/1209.6561}, year = 2012 }
- G. Borboudakis and I. Tsamardinos, "Incorporating causal prior knowledge as path-constraints in bayesian networks and maximal ancestral graphs," Proceedings of the 29th International Conference on Machine Learning, ICML 2012, 2012.
[BibTeX] [Download PDF]@article{Borboudakis2012a, added-at = {2018-12-23T19:41:26.000+0100}, author = {Borboudakis, G and Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/245e62f2e40437af59cc752ece23f4f9d/mensxmachina}, const = {\ text}, interhash = {33e06f0a90109e7d3dea3523a245cdf2}, intrahash = {45e62f2e40437af59cc752ece23f4f9d}, journal = {Proceedings of the 29th International Conference on Machine Learning, ICML 2012}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Incorporating causal prior knowledge as path-constraints in bayesian networks and maximal ancestral graphs}, url = {http://arxiv.org/abs/1206.6390}, year = 2012 }
- A. P. Armen, I. Tsamardinos, N. Karathanasis, and P. Poirazi, "SVM-based miRNA: MiRNA∗ duplex prediction," IEEE 12th International Conference on Bioinformatics and Bioengineering, BIBE 2012, 2012.
[BibTeX] [Download PDF]@article{Armen2012, added-at = {2018-12-23T19:41:26.000+0100}, author = {Armen, A. P. and Tsamardinos, I. and Karathanasis, N. and Poirazi, P.}, biburl = {https://www.bibsonomy.org/bibtex/2722c65b939cc487f3fc4a5b39a342da1/mensxmachina}, const = {\ text}, interhash = {664f071dbf289def04ab900e7537aab0}, intrahash = {722c65b939cc487f3fc4a5b39a342da1}, journal = {IEEE 12th International Conference on Bioinformatics and Bioengineering, BIBE 2012}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {SVM-based miRNA: MiRNA∗ duplex prediction}, url = {http://ieeexplore.ieee.org/xpls/abs{\_}all.jsp?arnumber=6399670}, year = 2012 }
2011
- I. Tsamardinos, O. D. Roe, and V. Lagani, "Introducing Integrative Causal Analysis for Co-Analyzing Heterogeneous Studies with an Application to Methylation and Gene Expression Mesothelioma Cancer Data.," 6th Conference of the Hellenic Society for Computational Biology and Bioinformatics (HSCBB11), 2011.
[BibTeX]@article{Tsamardinos2011, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, I. and Roe, O. D. and Lagani, V.}, biburl = {https://www.bibsonomy.org/bibtex/2b35166fc5f2a3f42dc7c176a776dc485/mensxmachina}, const = {\ text}, interhash = {b262f6fa1fa7efe0e0ce4ecc55110d69}, intrahash = {b35166fc5f2a3f42dc7c176a776dc485}, journal = {6th Conference of the Hellenic Society for Computational Biology and Bioinformatics (HSCBB11)}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Introducing Integrative Causal Analysis for Co-Analyzing Heterogeneous Studies with an Application to Methylation and Gene Expression Mesothelioma Cancer Data.}, year = 2011 }
- V. Lagani, I. Tsamardinos, M. Grammatikou, and G. Garinis, "A Genome-Wide Study of the Effect of Aging on Level-2 Gene-Ontology Categories in Mice Using Mixed Models," The 5th International Workshop on Data Mining in Functional Genomics and Proteomics: Current Trends and Future Directions, ECML PKDD 2011, 2011.
[BibTeX]@article{Lagani2011, added-at = {2018-12-23T19:41:26.000+0100}, author = {Lagani, Vincenzo and Tsamardinos, Ioannis and Grammatikou, Magda and Garinis, George}, biburl = {https://www.bibsonomy.org/bibtex/26ff8f456ff88a57da25ead3b023f47a8/mensxmachina}, const = {\ text}, interhash = {e627413201308c4b22ab36633ea412b4}, intrahash = {6ff8f456ff88a57da25ead3b023f47a8}, journal = {The 5th International Workshop on Data Mining in Functional Genomics and Proteomics: Current Trends and Future Directions, ECML PKDD 2011}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {A Genome-Wide Study of the Effect of Aging on Level-2 Gene-Ontology Categories in Mice Using Mixed Models}, year = 2011 }
- V. Lagani, V. Kontogiannis, P. Argyropaidas, and C. Chronaki, "Use of SMS for Tsunami Early Warnings at a Table Top Exercise," 4th ICST International Conference on eHealth (eHealth 2011), 2011.
[BibTeX] [Download PDF]@article{Lagani2011a, added-at = {2018-12-23T19:41:26.000+0100}, author = {Lagani, V. and Kontogiannis, V. and Argyropaidas, P. and Chronaki, C.}, biburl = {https://www.bibsonomy.org/bibtex/29f1c0607e255143a3081d6a2a6d1d612/mensxmachina}, const = {\ text}, interhash = {56a013f9384824f68f8c0a14039a43c3}, intrahash = {9f1c0607e255143a3081d6a2a6d1d612}, journal = {4th ICST International Conference on eHealth (eHealth 2011)}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Use of SMS for Tsunami Early Warnings at a Table Top Exercise}, url = {http://link.springer.com/chapter/10.1007/978-3-642-29262-0{\_}10}, year = 2011 }
- L. Koumakis, F. Chiarugi, V. Lagani, and I. Tsamardinos, "Risk assessment models for diabetes complications: A survey of available online tools," 2nd International ICST Conference on Wireless Mobile Communication and Healthcare (MobiHealth 2011), 2011.
[BibTeX] [Download PDF]@article{Koumakis2011, added-at = {2018-12-23T19:41:26.000+0100}, author = {Koumakis, L. and Chiarugi, F. and Lagani, V. and Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/2579226968e43224ddd3f3a71d5ba47a0/mensxmachina}, const = {\ text}, interhash = {e33ebdab81736a155879a7151903ca6e}, intrahash = {579226968e43224ddd3f3a71d5ba47a0}, journal = {2nd International ICST Conference on Wireless Mobile Communication and Healthcare (MobiHealth 2011)}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Risk assessment models for diabetes complications: A survey of available online tools}, url = {http://link.springer.com/chapter/10.1007/978-3-642-29734-2{\_}7}, year = 2011 }
- C. Filippaki, G. Antoniou, and I. Tsamardinos, "Using constraint optimization for conflict resolution and detail control in activity recognition," Second International Joint Conference on Ambient Intelligence 2011, 2011.
[BibTeX] [Download PDF]@article{Filippaki2011, added-at = {2018-12-23T19:41:26.000+0100}, author = {Filippaki, C and Antoniou, G and Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/2769c75f7768c9fefee6a5f503934ccb8/mensxmachina}, const = {\ text}, interhash = {f8ace0c5333598e01a21f644d29664f1}, intrahash = {769c75f7768c9fefee6a5f503934ccb8}, journal = {Second International Joint Conference on Ambient Intelligence 2011}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Using constraint optimization for conflict resolution and detail control in activity recognition}, url = {http://link.springer.com/10.1007{\%}2F978-3-642-25167-2{\_}6}, year = 2011 }
- E. G. Christodoulou, O. D. Røe, A. Folarin, and I. Tsamardinos, "Information-Preserving Techniques Improve Chemosensitivity Prediction of Tumours Based on Expression Profiles," 12th Engineering Applications of Neural Networks (EANN) / 7th Artificial Intelligence Applications and Innovations (AIAI) joint conferences, Workshop on Computational Intelligence Applications in Bioinformatics (CIAB 2011), 2011.
[BibTeX] [Download PDF]@article{Christodoulou2011, added-at = {2018-12-23T19:41:26.000+0100}, author = {Christodoulou, E. G. and R{\o}e, O. D. and Folarin, A. and Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/247e8055808b0231202d768604a8a1918/mensxmachina}, const = {\ text}, interhash = {68325a50de33ed86d9b5149968d6d030}, intrahash = {47e8055808b0231202d768604a8a1918}, journal = {12th Engineering Applications of Neural Networks (EANN) / 7th Artificial Intelligence Applications and Innovations (AIAI) joint conferences, Workshop on Computational Intelligence Applications in Bioinformatics (CIAB 2011)}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Information-Preserving Techniques Improve Chemosensitivity Prediction of Tumours Based on Expression Profiles}, url = {http://link.springer.com/chapter/10.1007/978-3-642-23957-1{\_}50}, year = 2011 }
- G. Borboudakis, S. Triantafilou, V. Lagani, and I. Tsamardinos, "A constraint-based approach to incorporate prior knowledge in causal models.," Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), 2011.
[BibTeX] [Download PDF]@article{Borboudakis2011, added-at = {2018-12-23T19:41:26.000+0100}, author = {Borboudakis, G and Triantafilou, S and Lagani, V and Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/2785f0fd1b8229f2d87df9af06c9c2701/mensxmachina}, const = {\ text}, interhash = {efbe449feece1a5d284b3e6838c40ffd}, intrahash = {785f0fd1b8229f2d87df9af06c9c2701}, journal = {Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN)}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {A constraint-based approach to incorporate prior knowledge in causal models.}, url = {https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2011-76.pdf}, year = 2011 }
- A. Armen and I. Tsamardinos, "A unified approach to estimation and control of the False Discovery Rate in Bayesian network skeleton identification.," Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), 2011.
[BibTeX] [Download PDF]@article{Armen2011, added-at = {2018-12-23T19:41:26.000+0100}, author = {Armen, AP and Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/2226639af0dd6060ea855ee876f1dcb26/mensxmachina}, const = {\ text}, interhash = {4176056383b75831a27a549f3c9dd48d}, intrahash = {226639af0dd6060ea855ee876f1dcb26}, journal = {Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN)}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {A unified approach to estimation and control of the False Discovery Rate in Bayesian network skeleton identification.}, url = {http://www.mensxmachina.org/files/publications/armen2011unified.pdf}, year = 2011 }
2010
- I. Tsamardinos and G. Borboudakis, "Permutation testing improves Bayesian network learning," Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases: Part III, 2010.
[BibTeX] [Download PDF]@article{Tsamardinos2010, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, I and Borboudakis, G}, biburl = {https://www.bibsonomy.org/bibtex/2e9a638a93f64ba0198835e00ddec53a8/mensxmachina}, const = {\ text}, interhash = {80093dbd1ab14f50f6b127c66eb9cb89}, intrahash = {e9a638a93f64ba0198835e00ddec53a8}, journal = {Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases: Part III}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Permutation testing improves Bayesian network learning}, url = {http://link.springer.com/chapter/10.1007/978-3-642-15939-8{\_}21}, year = 2010 }
- S. Triantafullou, I. Tsamardinos, and I. G. Tollis, "Learning causal structure from overlapping variable sets," Proceedings of The Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2010.
[BibTeX] [Download PDF]@article{Triantafullou2010, added-at = {2018-12-23T19:41:26.000+0100}, author = {Triantafullou, S. and Tsamardinos, I. and Tollis, I.G.}, biburl = {https://www.bibsonomy.org/bibtex/2a5f72ebb97a62735b470bc65fa514546/mensxmachina}, const = {\ text}, interhash = {90635b251bb6a2d27e842678bb34300d}, intrahash = {a5f72ebb97a62735b470bc65fa514546}, journal = {Proceedings of The Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS)}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Learning causal structure from overlapping variable sets}, url = {http://machinelearning.wustl.edu/mlpapers/paper{\_}files/AISTATS2010{\_}TriantafillouTT10.pdf}, year = 2010 }
- V. Lagani and I. Tsamardinos, "Structure-based variable selection for survival data," Bioinformatics, iss. 15, 2010.
[BibTeX] [Download PDF]@article{Lagani2010, added-at = {2018-12-23T19:41:26.000+0100}, author = {Lagani, V and Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/2160917ab02bbd4f929dd2b0ca7e65967/mensxmachina}, const = {\ text}, interhash = {a5a43dfa45513e26eef4390e3c3a7fdf}, intrahash = {160917ab02bbd4f929dd2b0ca7e65967}, journal = {Bioinformatics}, keywords = {imported}, number = 15, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Structure-based variable selection for survival data}, url = {http://bioinformatics.oxfordjournals.org/content/26/15/1887.short}, year = 2010 }
- P. Hunter, P. V. Coveney, B. de Bono, V. Diaz, J. Fenner, A. F. Frangi, P. Harris, R. Hose, P. Kohl, P. Lawford, K. McCormack, M. Mendes, S. Omholt, A. Quarteroni, J. Skar, J. Tegner, S. Randall, I. G. Tollis, I. Tsamardinos, and M. van Beek, "A vision and strategy for the virtual physiological human in 2010 and beyond," Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, iss. 1920, 2010.
[BibTeX] [Download PDF]@article{Hunter2010, added-at = {2018-12-23T19:41:26.000+0100}, author = {Hunter, P. and Coveney, P.V. and de Bono, B. and Diaz, V. and Fenner, J. and Frangi, A.F. and Harris, P. and Hose, R. and Kohl, P. and Lawford, P. and McCormack, K. and Mendes, M. and Omholt, S. and Quarteroni, A. and Skar, J. and Tegner, J. and Randall, S.Th. and Tollis, I.G. and Tsamardinos, I. and van Beek, M.}, biburl = {https://www.bibsonomy.org/bibtex/2219c6dfc77827a0d7bfbdd81c51e5960/mensxmachina}, const = {\ text}, interhash = {4d441689fd11d0ee1ce557cd60887095}, intrahash = {219c6dfc77827a0d7bfbdd81c51e5960}, journal = {Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences}, keywords = {imported}, number = 1920, timestamp = {2018-12-23T19:41:26.000+0100}, title = {A vision and strategy for the virtual physiological human in 2010 and beyond}, url = {http://rsta.royalsocietypublishing.org/content/368/1920/2595.short}, year = 2010 }
- K. Gkirtzou, I. Tsamardinos, P. Tsakalides, and P. Poirazi, "MatureBayes: a probabilistic algorithm for identifying the mature miRNA within novel precursors," PloS one, iss. 8, 2010.
[BibTeX] [Download PDF]@article{Gkirtzou2010, added-at = {2018-12-23T19:41:26.000+0100}, author = {Gkirtzou, K and Tsamardinos, I and Tsakalides, P and Poirazi, P}, biburl = {https://www.bibsonomy.org/bibtex/2a05231556860501b3e7a7eeb2a12170f/mensxmachina}, const = {\ text}, interhash = {bb47ad98920dd3d727cc5ff9518b2777}, intrahash = {a05231556860501b3e7a7eeb2a12170f}, journal = {PloS one}, keywords = {imported}, number = 8, timestamp = {2018-12-23T19:41:26.000+0100}, title = {MatureBayes: a probabilistic algorithm for identifying the mature miRNA within novel precursors}, url = {http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0011843}, year = 2010 }
- F. Chiarugi, D. Emmanouilidou, and I. Tsamardinos, "The morphological classification of heartbeats as dominant and non-dominant in ECG signals," Physiological Measurement, iss. 5, 2010.
[BibTeX] [Download PDF]@article{Chiarugi2010, added-at = {2018-12-23T19:41:26.000+0100}, author = {Chiarugi, F. and Emmanouilidou, D. and Tsamardinos, I.}, biburl = {https://www.bibsonomy.org/bibtex/223e711710b7499c72ecaac42dcc33b5b/mensxmachina}, const = {\ text}, interhash = {9a2038aeebea069e0ab0a1fa49c0c170}, intrahash = {23e711710b7499c72ecaac42dcc33b5b}, journal = {Physiological Measurement}, keywords = {imported}, number = 5, timestamp = {2018-12-23T19:41:26.000+0100}, title = {The morphological classification of heartbeats as dominant and non-dominant in ECG signals}, url = {http://iopscience.iop.org/article/10.1088/0967-3334/31/5/002/meta}, year = 2010 }
- C. F. Aliferis, A. R. Statnikov, I. Tsamardinos, S. Mani, and X. D. Koutsoukos, "Local causal and markov blanket induction for causal discovery and feature selection for classification part ii: Analysis and extensions," The Journal of Machine Learning Research, 2010.
[BibTeX] [Download PDF]@article{Aliferis2010, added-at = {2018-12-23T19:41:26.000+0100}, author = {Aliferis, C.F. and Statnikov, A.R. and Tsamardinos, I. and Mani, S. and Koutsoukos, X.D.}, biburl = {https://www.bibsonomy.org/bibtex/2eb7a1cf9191ce1b7f76fa046e2cd2367/mensxmachina}, const = {\ text}, interhash = {4567b233fe389f3e6694a3f39581f38f}, intrahash = {eb7a1cf9191ce1b7f76fa046e2cd2367}, journal = {The Journal of Machine Learning Research}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Local causal and markov blanket induction for causal discovery and feature selection for classification part ii: Analysis and extensions}, url = {http://dl.acm.org/citation.cfm?id=1756014}, year = 2010 }
- C. F. Aliferis, A. R. Statnikov, I. Tsamardinos, S. Mani, and X. D. Koutsoukos, "Local causal and markov blanket induction for causal discovery and feature selection for classification part i: Analysis and extensions," The Journal of Machine Learning Research, 2010.
[BibTeX] [Download PDF]@article{Aliferis2010a, added-at = {2018-12-23T19:41:26.000+0100}, author = {Aliferis, C.F. and Statnikov, A.R. and Tsamardinos, I. and Mani, S. and Koutsoukos, X.D.}, biburl = {https://www.bibsonomy.org/bibtex/21fdc0ed5f563b0357270ca9e9a8a2354/mensxmachina}, const = {\ text}, interhash = {270fda0f94b6b8bc193e9060c451497b}, intrahash = {1fdc0ed5f563b0357270ca9e9a8a2354}, journal = {The Journal of Machine Learning Research}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Local causal and markov blanket induction for causal discovery and feature selection for classification part i: Analysis and extensions}, url = {http://dl.acm.org/citation.cfm?id=1756014}, year = 2010 }
2009
- I. Tsamardinos and S. Triantafillou, "The possibility of integrative causal analysis: Learning from different datasets and studies," Journal of Engineering Intelligent Systems, iss. 2-3, 2009.
[BibTeX] [Download PDF]@article{Tsamardinos2009, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, I and Triantafillou, S}, biburl = {https://www.bibsonomy.org/bibtex/2c470fe65cb566265d8b422ba42cdbed5/mensxmachina}, const = {\ text}, interhash = {c50da6faf4ad78fe6c02157d854d5b03}, intrahash = {c470fe65cb566265d8b422ba42cdbed5}, journal = {Journal of Engineering Intelligent Systems}, keywords = {imported}, number = {2-3}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {The possibility of integrative causal analysis: Learning from different datasets and studies}, url = {http://cat.inist.fr/?aModele=afficheN{\&}cpsidt=23113749}, year = 2009 }
- I. Tsamardinos and A. Mariglis, "Multi-source causal analysis: Learning Bayesian networks from multiple datasets," Artificial Intelligence Applications and Innovations III, Proceedings of the 5TH IFIP Conference on Artificial Intelligence Applications and Innovations, 2009.
[BibTeX] [Download PDF]@article{Tsamardinos2009a, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, I and Mariglis, AP}, biburl = {https://www.bibsonomy.org/bibtex/2d870a7880d4c1e649566594e77e01878/mensxmachina}, const = {\ text}, interhash = {2490abb54445eb9b9aa4025a866f730c}, intrahash = {d870a7880d4c1e649566594e77e01878}, journal = {Artificial Intelligence Applications and Innovations III, Proceedings of the 5TH IFIP Conference on Artificial Intelligence Applications and Innovations}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Multi-source causal analysis: Learning Bayesian networks from multiple datasets}, url = {http://link.springer.com/chapter/10.1007/978-1-4419-0221-4{\_}56}, year = 2009 }
- C. F. Aliferis, A. R. Statnikov, I. Tsamardinos, J. S. Schildcrout, B. E. Shepherd, and J. F. E. Harrell, "Factors influencing the statistical power of complex data analysis protocols for molecular signature development from microarray data," PloS one, iss. 3, 2009.
[BibTeX] [Download PDF]@article{Aliferis2009, added-at = {2018-12-23T19:41:26.000+0100}, author = {Aliferis, C.F. and Statnikov, A.R. and Tsamardinos, I. and Schildcrout, J.S. and Shepherd, B.E. and Harrell, Jr F.E.}, biburl = {https://www.bibsonomy.org/bibtex/256885ce184272bd9a4c9182c2c25d89f/mensxmachina}, const = {\ text}, interhash = {64db6c934a5cfeb7b05ca9be15717cbf}, intrahash = {56885ce184272bd9a4c9182c2c25d89f}, journal = {PloS one}, keywords = {imported}, number = 3, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Factors influencing the statistical power of complex data analysis protocols for molecular signature development from microarray data}, url = {http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0004922}, year = 2009 }
2008
- I. Tsamardinos and L. Brown, "Bounding the False Discovery Rate in Local Bayesian Network Learning.," Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, 2008.
[BibTeX] [Download PDF]@article{Tsamardinos2008, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, I and Brown, LE}, biburl = {https://www.bibsonomy.org/bibtex/271ccc5325fc4dcef8032a57cc1169eb0/mensxmachina}, const = {\ text}, interhash = {c6e5526f4e147ee84ff98f7609703f9d}, intrahash = {71ccc5325fc4dcef8032a57cc1169eb0}, journal = {Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Bounding the False Discovery Rate in Local Bayesian Network Learning.}, url = {http://www.aaai.org/Papers/AAAI/2008/AAAI08-174.pdf}, year = 2008 }
- F. Chiarugi, D. Emmanouilidou, I. Tsamardinos, and I. G. Tollis, "Morphological classification of heartbeats using similarity features and a two-phase decision tree," Computers in Cardiology, 2008.
[BibTeX] [Download PDF]@article{Chiarugi2008, added-at = {2018-12-23T19:41:26.000+0100}, author = {Chiarugi, F. and Emmanouilidou, D. and Tsamardinos, I. and Tollis, I.G}, biburl = {https://www.bibsonomy.org/bibtex/234c1632f7def49e043648cd5596febff/mensxmachina}, const = {\ text}, interhash = {12a0053de8b9cd6c1274cc23f5d38c5b}, intrahash = {34c1632f7def49e043648cd5596febff}, journal = {Computers in Cardiology}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Morphological classification of heartbeats using similarity features and a two-phase decision tree}, url = {http://ieeexplore.ieee.org/xpls/abs{\_}all.jsp?arnumber=4749175}, year = 2008 }
- L. Brown and I. Tsamardinos, "A Strategy for Making Predictions Under Manipulation.," In JMLR: Workshop and Conference Proceedings, 2008.
[BibTeX] [Download PDF]@article{Brown2008, added-at = {2018-12-23T19:41:26.000+0100}, author = {Brown, LE and Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/2b2dad9fb018736b581dc5c4a6bdacc67/mensxmachina}, const = {\ text}, interhash = {5b99910433f1d593510f74a41d6a453d}, intrahash = {b2dad9fb018736b581dc5c4a6bdacc67}, journal = {In JMLR: Workshop and Conference Proceedings}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {A Strategy for Making Predictions Under Manipulation.}, url = {http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.321.5436{\&}rep=rep1{\&}type=pdf}, year = 2008 }
2006
- I. Tsamardinos, L. Brown, and C. Aliferis, "The max-min hill-climbing Bayesian network structure learning algorithm," Machine learning, iss. 1, 2006.
[BibTeX] [Download PDF]@article{Tsamardinos2006, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, I and Brown, LE and Aliferis, CF}, biburl = {https://www.bibsonomy.org/bibtex/2c807a2121c16bf1dda0304cf0afe64cd/mensxmachina}, const = {\ text}, interhash = {e2c18d415541ed81713bdb0c9b85c554}, intrahash = {c807a2121c16bf1dda0304cf0afe64cd}, journal = {Machine learning}, keywords = {imported}, number = 1, timestamp = {2018-12-23T19:41:26.000+0100}, title = {The max-min hill-climbing Bayesian network structure learning algorithm}, url = {http://link.springer.com/article/10.1007/s10994-006-6889-7}, year = 2006 }
- I. Tsamardinos, A. Statnikov, L. Brown, and C. Aliferis, "Generating Realistic Large Bayesian Networks by Tiling.," In The 19th International FLAIRS Conference, 2006.
[BibTeX] [Download PDF]@article{Tsamardinos2006a, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, I and Statnikov, AR and Brown, LE and Aliferis, CF}, biburl = {https://www.bibsonomy.org/bibtex/25d4c2ebca0b8698e2f1eef6c7b40163b/mensxmachina}, const = {\ text}, interhash = {5452fbf6ccb3501c61c0118b5e17bcfd}, intrahash = {5d4c2ebca0b8698e2f1eef6c7b40163b}, journal = {In The 19th International FLAIRS Conference}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Generating Realistic Large Bayesian Networks by Tiling.}, url = {http://www.aaai.org/Library/FLAIRS/2006/flairs06-116.php}, year = 2006 }
- C. Aliferis, A. Statnikov, and I. Tsamardinos, "Challenges in the analysis of mass-throughput data: a technical commentary from the statistical machine learning perspective," Cancer Informatics, 2006.
[BibTeX] [Download PDF]@article{Aliferis2006, added-at = {2018-12-23T19:41:26.000+0100}, author = {Aliferis, CF and Statnikov, A and Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/2a5332642b94b39335bd5a2148b9b8627/mensxmachina}, const = {\ text}, interhash = {523b67504e61dad87adedcc4a7adbb75}, intrahash = {a5332642b94b39335bd5a2148b9b8627}, journal = {Cancer Informatics}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Challenges in the analysis of mass-throughput data: a technical commentary from the statistical machine learning perspective}, url = {http://search.proquest.com/openview/2d3564860fc6664a1bccac383354a744/1?pq-origsite=gscholar}, year = 2006 }
2005
- A. Statnikov, I. Tsamardinos, Y. Dosbayev, and C. F. Aliferis, "GEMS: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data," International journal of medical informatics, iss. 7-8, 2005.
[BibTeX] [Download PDF]@article{Statnikov2005a, added-at = {2018-12-23T19:41:26.000+0100}, author = {Statnikov, A. and Tsamardinos, I. and Dosbayev, Y. and Aliferis, C. F}, biburl = {https://www.bibsonomy.org/bibtex/22d9b271ec7cc9b1595d04108a95f2c90/mensxmachina}, const = {\ text}, interhash = {fe2695ceba306c111171f04ab0510082}, intrahash = {2d9b271ec7cc9b1595d04108a95f2c90}, journal = {International journal of medical informatics}, keywords = {imported}, number = {7-8}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {GEMS: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data}, url = {http://www.sciencedirect.com/science/article/pii/S1386505605000523}, year = 2005 }
- A. Statnikov, C. F. Aliferis, I. Tsamardinos, D. Hardin, and S. Levy, "A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis," Bioinformatics, iss. 5, 2005.
[BibTeX] [Download PDF]@article{Statnikov2005, added-at = {2018-12-23T19:41:26.000+0100}, author = {Statnikov, A. and Aliferis, C. F. and Tsamardinos, I. and Hardin, D. and Levy, S}, biburl = {https://www.bibsonomy.org/bibtex/29756ae758981dc2f9bdc3d8bb9614936/mensxmachina}, const = {\ text}, interhash = {3808d2ca8ef5429ce10d7e308b2183c4}, intrahash = {9756ae758981dc2f9bdc3d8bb9614936}, journal = {Bioinformatics}, keywords = {imported}, number = 5, timestamp = {2018-12-23T19:41:26.000+0100}, title = {A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis}, url = {http://bioinformatics.oxfordjournals.org/content/21/5/631.short}, year = 2005 }
- M. Pollack and I. Tsamardinos, "Efficiently dispatching plans encoded as simple temporal problems," Intelligent Techniques for Planning, 2005.
[BibTeX] [Download PDF]@article{Pollack2005, added-at = {2018-12-23T19:41:26.000+0100}, author = {Pollack, ME and Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/2cc163a50d95746457315e6d52117239b/mensxmachina}, const = {\ text}, interhash = {a78eae5f4415a9bad6787860ddcb9335}, intrahash = {cc163a50d95746457315e6d52117239b}, journal = {Intelligent Techniques for Planning}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Efficiently dispatching plans encoded as simple temporal problems}, url = {https://www.google.com/books?hl=el{\&}lr={\&}id=-MdQtA2GfT0C{\&}oi=fnd{\&}pg=PA296{\&}dq=Pollack,+M.+E.,+{\%}26+Tsamardinos,+I.+(2005).+Efficiently+Dispatching+Plans+Encoded+as+Simple+Temporal+Problems.+Intelligent+Techniques+for+Planning,+296-319.{\&}ots=v6W-KaTlvh{\&}sig=BMlur}, year = 2005 }
- L. Brown, I. Tsamardinos, and C. Aliferis, "A comparison of novel and state-of-the-art polynomial Bayesian network learning algorithms," In Proceedings of the national conference on artificial intelligence, iss. 2, 2005.
[BibTeX] [Download PDF]@article{Brown2005, added-at = {2018-12-23T19:41:26.000+0100}, author = {Brown, LE and Tsamardinos, I and Aliferis, CF}, biburl = {https://www.bibsonomy.org/bibtex/2a5d3661a293836ba4483b8912beaf981/mensxmachina}, const = {\ text}, interhash = {37903cfa409c02acb244b780a0017ca2}, intrahash = {a5d3661a293836ba4483b8912beaf981}, journal = {In Proceedings of the national conference on artificial intelligence}, keywords = {imported}, number = 2, timestamp = {2018-12-23T19:41:26.000+0100}, title = {A comparison of novel and state-of-the-art polynomial Bayesian network learning algorithms}, url = {http://www.aaai.org/Papers/AAAI/2005/AAAI05-116.pdf}, year = 2005 }
- Y. Aphinyanaphongs, I. Tsamardinos, A. Statnikov, D. Hardin, and C. F. Aliferis, "Text categorization models for high-quality article retrieval in internal medicine," Journal of the American Medical Informatics Association, iss. 2, 2005.
[BibTeX] [Download PDF]@article{Aphinyanaphongs2005, added-at = {2018-12-23T19:41:26.000+0100}, author = {Aphinyanaphongs, Y. and Tsamardinos, I. and Statnikov, A. and Hardin, D. and Aliferis, C. F}, biburl = {https://www.bibsonomy.org/bibtex/2b037f28c3d7f26d80ee87d7700c41c59/mensxmachina}, const = {\ text}, interhash = {410e0ecdb1308708c13739d8b99cdda0}, intrahash = {b037f28c3d7f26d80ee87d7700c41c59}, journal = {Journal of the American Medical Informatics Association}, keywords = {imported}, number = 2, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Text categorization models for high-quality article retrieval in internal medicine}, url = {http://www.sciencedirect.com/science/article/pii/S106750270400194X}, year = 2005 }
2004
- A. Statnikov, C. Aliferis, and I. Tsamardinos, "Methods for multi-category cancer diagnosis from gene expression data: a comprehensive evaluation to inform decision support system development," in Proceedings of the 11th World Congress in Medical Informatics (MEDINFO '04), iss. 2, 2004.
[BibTeX] [Download PDF]@article{Statnikov2004, added-at = {2018-12-23T19:41:26.000+0100}, author = {Statnikov, A and Aliferis, CF and Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/217f4c8c5197f88c16b9eba677f09c2f7/mensxmachina}, const = {\ text}, interhash = {574104da3d2d801ecf0749d9aa173855}, intrahash = {17f4c8c5197f88c16b9eba677f09c2f7}, journal = {in Proceedings of the 11th World Congress in Medical Informatics (MEDINFO '04)}, keywords = {imported}, number = 2, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Methods for multi-category cancer diagnosis from gene expression data: a comprehensive evaluation to inform decision support system development}, url = {http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.128.11{\&}rep=rep1{\&}type=pdf}, year = 2004 }
- D. Hardin, I. Tsamardinos, and C. Aliferis, "A theoretical characterization of linear SVM-based feature selection," The Twenty-First International Conference on Machine Learning (ICML 2004), 2004.
[BibTeX] [Download PDF]@article{Hardin2004, added-at = {2018-12-23T19:41:26.000+0100}, author = {Hardin, D and Tsamardinos, I and Aliferis, CF}, biburl = {https://www.bibsonomy.org/bibtex/2b6de8a84d5649baa0b6c5b9283c222c1/mensxmachina}, const = {\ text}, interhash = {fcd4d72209d7fead815d8b263ce1d4d1}, intrahash = {b6de8a84d5649baa0b6c5b9283c222c1}, journal = {The Twenty-First International Conference on Machine Learning (ICML 2004)}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {A theoretical characterization of linear SVM-based feature selection}, url = {http://dl.acm.org/citation.cfm?id=1015421}, year = 2004 }
- L. Brown, I. Tsamardinos, and C. Aliferis, "A novel algorithm for scalable and accurate Bayesian network learning," in Proceedings of 11th World Congress in Medical Informatics (MEDINFO '04), iss. 1, 2004.
[BibTeX] [Download PDF]@article{Brown2004, added-at = {2018-12-23T19:41:26.000+0100}, author = {Brown, LE and Tsamardinos, I and Aliferis, CF}, biburl = {https://www.bibsonomy.org/bibtex/2aa4c24a77fd89828bda43abe88fba2af/mensxmachina}, const = {\ text}, interhash = {56c42b78901dc469545733988fedf496}, intrahash = {aa4c24a77fd89828bda43abe88fba2af}, journal = {in Proceedings of 11th World Congress in Medical Informatics (MEDINFO '04)}, keywords = {imported}, number = 1, timestamp = {2018-12-23T19:41:26.000+0100}, title = {A novel algorithm for scalable and accurate Bayesian network learning}, url = {http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.114.549{\&}rep=rep1{\&}type=pdf}, year = 2004 }
2003
- I. Tsamardinos and C. Aliferis, "Towards Principled Feature Selection: Relevancy, Filters and Wrappers.." 2003.
[BibTeX] [Download PDF]@inproceedings{Tsamardinos2003, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, I and Aliferis, CF}, biburl = {https://www.bibsonomy.org/bibtex/2d36de88dfca2582e59a5b9cd1cba4ee7/mensxmachina}, const = {\ text}, interhash = {94cf2d3b6cb9b532f858589cd9040028}, intrahash = {d36de88dfca2582e59a5b9cd1cba4ee7}, keywords = {imported}, timestamp = {2018-12-23T21:03:57.000+0100}, title = {Towards Principled Feature Selection: Relevancy, Filters and Wrappers.}, url = {https://pdfs.semanticscholar.org/bea3/09c3ab26ba06ff8208f41db3ca96c15b5a6c.pdf}, year = 2003 }
- I. Tsamardinos, M. E. Pollack, and S. Ramakrishnan, "Assessing the probability of legal execution of plans with temporal uncertainty," ICAPS03 Workshop on Planning under Un-certainty and Incomplete Information, 2003.
[BibTeX] [Download PDF]@article{Tsamardinos2003d, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, I. and Pollack, M. E. and Ramakrishnan, S.}, biburl = {https://www.bibsonomy.org/bibtex/209a72015121e4da24ca3dd2c9eb39beb/mensxmachina}, const = {\ text}, interhash = {11c0cfa3a8b471d19dd2e5fe238753c9}, intrahash = {09a72015121e4da24ca3dd2c9eb39beb}, journal = {ICAPS03 Workshop on Planning under Un-certainty and Incomplete Information}, keywords = {imported}, timestamp = {2018-12-23T21:03:57.000+0100}, title = {Assessing the probability of legal execution of plans with temporal uncertainty}, url = {https://www.researchgate.net/profile/Ioannis{\_}Tsamardinos/publication/2481676{\_}Assessing{\_}the{\_}Probability{\_}of{\_}Legal{\_}Execution{\_}of{\_}Plans{\_}with{\_}Temporal{\_}Uncertainty/links/00b49520e1396c30c0000000.pdf}, year = 2003 }
- I. Tsamardinos and M. Pollack, "Efficient solution techniques for disjunctive temporal reasoning problems," Artificial Intelligence, iss. 1-2, 2003.
[BibTeX] [Download PDF]@article{Tsamardinos2003c, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, I and Pollack, ME}, biburl = {https://www.bibsonomy.org/bibtex/2faf2dbf8d496161f826d13eb1afe92ec/mensxmachina}, const = {\ text}, interhash = {a3a0d51bb29071e61eb4a40d08cc1fd9}, intrahash = {faf2dbf8d496161f826d13eb1afe92ec}, journal = {Artificial Intelligence}, keywords = {imported}, number = {1-2}, timestamp = {2018-12-23T21:03:57.000+0100}, title = {Efficient solution techniques for disjunctive temporal reasoning problems}, url = {http://www.sciencedirect.com/science/article/pii/S0004370203001139}, year = 2003 }
- I. Tsamardinos, C. Aliferis, and A. Statnikov, "Time and sample efficient discovery of Markov blankets and direct causal relations," The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003), 2003.
[BibTeX] [Download PDF]@article{Tsamardinos2003e, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, I and Aliferis, CF and Statnikov, A}, biburl = {https://www.bibsonomy.org/bibtex/24b818e750278c74fd63ba62ed17aafbd/mensxmachina}, const = {\ text}, interhash = {0986575ecec243520deae406ff789aef}, intrahash = {4b818e750278c74fd63ba62ed17aafbd}, journal = {The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2003)}, keywords = {imported}, timestamp = {2018-12-23T21:03:57.000+0100}, title = {Time and sample efficient discovery of Markov blankets and direct causal relations}, url = {http://dl.acm.org/citation.cfm?id=956838}, year = 2003 }
- I. Tsamardinos, T. Vidal, and M. Pollack, "CTP: A new constraint-based formalism for conditional, temporal planning," Special Issue on Planning of Constraints Journal, iss. 4, 2003.
[BibTeX] [Download PDF]@article{Tsamardinos2003b, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, I and Vidal, T and Pollack, ME}, biburl = {https://www.bibsonomy.org/bibtex/27e1e3eecef2845635b5450ab550c99a1/mensxmachina}, const = {\ text}, interhash = {0f6f8073f8cae23c8463af1d2d48f880}, intrahash = {7e1e3eecef2845635b5450ab550c99a1}, journal = {Special Issue on Planning of Constraints Journal}, keywords = {imported}, number = 4, timestamp = {2018-12-23T21:03:57.000+0100}, title = {CTP: A new constraint-based formalism for conditional, temporal planning}, url = {http://link.springer.com/article/10.1023/A:1025894003623}, year = 2003 }
- I. Tsamardinos, C. Aliferis, A. Statnikov, and E. Statnikov, "Algorithms for Large Scale Markov Blanket Discovery.," FLAIRS conference, 2003.
[BibTeX] [Download PDF]@article{Tsamardinos2003a, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, I and Aliferis, CF and Statnikov, AR and Statnikov, E}, biburl = {https://www.bibsonomy.org/bibtex/278724fd9e32de67f5076bb2963a31a15/mensxmachina}, const = {\ text}, interhash = {2ff0ea24c5e1d091e48113958abd042c}, intrahash = {78724fd9e32de67f5076bb2963a31a15}, journal = {FLAIRS conference}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Algorithms for Large Scale Markov Blanket Discovery.}, url = {http://www.aaai.org/Papers/FLAIRS/2003/Flairs03-073.pdf}, year = 2003 }
- M. E. Pollack, L. Brown, D. Colbry, C. E. McCarthy, C. Orosz, B. Peintner, S. Ramakrishnan, and I. Tsamardinos, "Autominder: An intelligent cognitive orthotic system for people with memory impairment," Robotics and Autonomous Systems, iss. 3-4, 2003.
[BibTeX] [Download PDF]@article{Pollack2003, added-at = {2018-12-23T19:41:26.000+0100}, author = {Pollack, M. E. and Brown, L. and Colbry, D. and McCarthy, C. E. and Orosz, C. and Peintner, B. and Ramakrishnan, S and Tsamardinos, I.}, biburl = {https://www.bibsonomy.org/bibtex/22640a8c90274a7c7e8a4659a3dd0d30e/mensxmachina}, const = {\ text}, interhash = {da9f187c1d84768cfc034a91f02d7411}, intrahash = {2640a8c90274a7c7e8a4659a3dd0d30e}, journal = {Robotics and Autonomous Systems}, keywords = {imported}, number = {3-4}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Autominder: An intelligent cognitive orthotic system for people with memory impairment}, url = {http://www.sciencedirect.com/science/article/pii/S0921889003000770}, year = 2003 }
- L. Frey, D. Fisher, I. Tsamardinos, C. F. Aliferis, and A. Statnikov, "Identifying Markov blankets with decision tree induction," The Third IEEE International Conference on Data Mining (ICDM'03), 2003.
[BibTeX] [Download PDF]@article{Frey2003, added-at = {2018-12-23T19:41:26.000+0100}, author = {Frey, Lewis and Fisher, Douglas and Tsamardinos, Ioannis and Aliferis, Constantin F. and Statnikov, Alexander}, biburl = {https://www.bibsonomy.org/bibtex/2a8283ed5b531af372a80b5ae4888dc0a/mensxmachina}, const = {\ text}, interhash = {a9d439786ce9a8a6f8c7aefc314511b2}, intrahash = {a8283ed5b531af372a80b5ae4888dc0a}, journal = {The Third IEEE International Conference on Data Mining (ICDM'03)}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Identifying Markov blankets with decision tree induction}, url = {http://ieeexplore.ieee.org/xpls/abs{\_}all.jsp?arnumber=1250903}, year = 2003 }
- C. F. Aliferis, I. Tsamardinos, and A. Statnikov, "HITON: a novel Markov Blanket algorithm for optimal variable selection," the American Medical Informatics Association meeting 2003 (AMIA 2003), 2003.
[BibTeX] [Download PDF]@article{Aliferis2003b, added-at = {2018-12-23T19:41:26.000+0100}, author = {Aliferis, Constantin F. and Tsamardinos, Ioannis and Statnikov, Alexander}, biburl = {https://www.bibsonomy.org/bibtex/24fb92705f2c21f8e572be0492d60dbcf/mensxmachina}, const = {\ text}, interhash = {0ecc8424850ee972bad77af01b623696}, intrahash = {4fb92705f2c21f8e572be0492d60dbcf}, journal = {the American Medical Informatics Association meeting 2003 (AMIA 2003)}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {HITON: a novel Markov Blanket algorithm for optimal variable selection}, url = {http://www.ncbi.nlm.nih.gov/pmc/articles/pmc1480117/}, year = 2003 }
- C. Aliferis, I. Tsamardinos, A. Statnikov, and L. Brown, "Causal Explorer: A Causal Probabilistic Network Learning Toolkit for Biomedical Discovery.," International Conference on Mathematics and Engineering Techniques in Medicine and Biolog-ical Sciences (METMBS '03), 2003.
[BibTeX] [Download PDF]@article{Aliferis2003a, added-at = {2018-12-23T19:41:26.000+0100}, author = {Aliferis, CF and Tsamardinos, I and Statnikov, AR and Brown, LE}, biburl = {https://www.bibsonomy.org/bibtex/2a81e18383c6b8fcdcafe32612c0d8ff7/mensxmachina}, const = {\ text}, interhash = {0b89ce976f3a19985af45d9a47915ebb}, intrahash = {a81e18383c6b8fcdcafe32612c0d8ff7}, journal = {International Conference on Mathematics and Engineering Techniques in Medicine and Biolog-ical Sciences (METMBS '03)}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Causal Explorer: A Causal Probabilistic Network Learning Toolkit for Biomedical Discovery.}, url = {http://dsl-lab.org/ml{\_}tutorial{\_}old/Publications/Causal{\_}Explorer.pdf}, year = 2003 }
- C. F. Aliferis, I. Tsamardinos, P. Massion, A. Statnikov, and D. Hardin, "Why Classification Models Using Array Gene Expression Data Perform So Well: A Preliminary Investigation of Explanatory Factors.," International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS '03), 2003.
[BibTeX] [Download PDF]@article{Aliferis2003, added-at = {2018-12-23T19:41:26.000+0100}, author = {Aliferis, C. F. and Tsamardinos, I. and Massion, P. and Statnikov, A. and Hardin, D.}, biburl = {https://www.bibsonomy.org/bibtex/27a2fbbc6198fedfe8ce505d5b765afa1/mensxmachina}, const = {\ text}, interhash = {a9cef44adcf3dc05363e0148ae991611}, intrahash = {7a2fbbc6198fedfe8ce505d5b765afa1}, journal = {International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS '03)}, keywords = {imported}, timestamp = {2018-12-23T19:41:26.000+0100}, title = {Why Classification Models Using Array Gene Expression Data Perform So Well: A Preliminary Investigation of Explanatory Factors.}, url = {http://ccdlab.org/paper-pdfs/METMBS{\_}2003{\_}1.pdf}, year = 2003 }
2002
- M. E. Pollack, C. E. McCarthy, S. Ramakrishnan, and I. Tsamardinos, "Execution-time plan management for a cognitive orthotic system," Plan-Based Control of Robotic Agents, 2002.
[BibTeX] [Download PDF]@article{Pollack2002a, added-at = {2018-12-23T19:41:26.000+0100}, author = {Pollack, Martha E. and McCarthy, Colleen E. and Ramakrishnan, Sailesh and Tsamardinos, Ioannis}, biburl = {https://www.bibsonomy.org/bibtex/2d9888cf1ec9a9d41243a017b4fe4f9e0/mensxmachina}, const = {\ text}, interhash = {a6b022b252d13ab185e7fcb3c4cdf3cd}, intrahash = {d9888cf1ec9a9d41243a017b4fe4f9e0}, journal = {Plan-Based Control of Robotic Agents}, keywords = {imported}, timestamp = {2018-12-23T21:03:57.000+0100}, title = {Execution-time plan management for a cognitive orthotic system}, url = {http://link.springer.com/chapter/10.1007/3-540-37724-7{\_}11}, year = 2002 }
- A. Berfield, P. K. Chrysanthis, I. Tsamardinos, M. E. Pollack, and S. Banerjee, "A scheme for integrating e-services in establishing virtual enterprises," 12th International Workshop on Research Issues on Data Engineering (RIDE-02), 2002.
[BibTeX] [Download PDF]@article{Berfield2002, added-at = {2018-12-23T19:41:26.000+0100}, author = {Berfield, Alan and Chrysanthis, Panos K. and Tsamardinos, Ioannis and Pollack, Martha E. and Banerjee, Sujata}, biburl = {https://www.bibsonomy.org/bibtex/2a4465ee923deaef9518b8b37b1210069/mensxmachina}, const = {\ text}, interhash = {643b74b1ea982a58569fd1eb48451aac}, intrahash = {a4465ee923deaef9518b8b37b1210069}, journal = {12th International Workshop on Research Issues on Data Engineering (RIDE-02)}, keywords = {imported}, timestamp = {2018-12-23T21:03:57.000+0100}, title = {A scheme for integrating e-services in establishing virtual enterprises}, url = {http://ieeexplore.ieee.org/xpls/abs{\_}all.jsp?arnumber=995107}, year = 2002 }
- M. E. Pollack, C. E. McCarthy, S. Ramakrishnan, I. Tsamardinos, L. Brown, S. Carrion, D. Colbry, C. Orosz, and B. Peintner, "Autominder: A planning, monitoring, and reminding assistive agent," Proceedings of the 7th International Conference on Intelligent Autonomous Systems (IAS), 2002.
[BibTeX] [Download PDF]@article{Pollack2002, added-at = {2018-12-23T19:41:26.000+0100}, author = {Pollack, M. E. and McCarthy, C. E . and Ramakrishnan, S. and Tsamardinos, I. and Brown, L. and Carrion, S. and Colbry, D. and Orosz, C. and Peintner, B.}, biburl = {https://www.bibsonomy.org/bibtex/2c0097ac3dd08ba433294c754d493b80f/mensxmachina}, const = {\ text}, interhash = {12687e5d1237733b7906110abba0df9c}, intrahash = {c0097ac3dd08ba433294c754d493b80f}, journal = {Proceedings of the 7th International Conference on Intelligent Autonomous Systems (IAS)}, keywords = {imported}, timestamp = {2018-12-23T21:03:57.000+0100}, title = {Autominder: A planning, monitoring, and reminding assistive agent}, url = {http://dsl-lab.org/Publications/Pollack{\_}2002b.pdf}, year = 2002 }
- C. F. Aliferis, I. Tsamardinos, P. Mansion, A. Statnikov, and D. Hardin, "Machine learning models for lung cancer classification using array comparative genomic hybridization.," 16th International FLAIRS Conference, 2002.
[BibTeX]@article{Aliferis2002, added-at = {2018-12-23T19:41:26.000+0100}, author = {Aliferis, Constantin F. and Tsamardinos, Ioannis and Mansion, Pierre and Statnikov, Alexander and Hardin, Douglas}, biburl = {https://www.bibsonomy.org/bibtex/247dad61cf26d26061567d065a56ae604/mensxmachina}, const = {\ text}, interhash = {46dd3fcd857ee0c1a2cf798ae33b79b8}, intrahash = {47dad61cf26d26061567d065a56ae604}, journal = {16th International FLAIRS Conference}, keywords = {imported}, timestamp = {2018-12-23T21:03:57.000+0100}, title = {Machine learning models for lung cancer classification using array comparative genomic hybridization.}, year = 2002 }
- I. Tsamardinos, "A probabilistic approach to robust execution of temporal plans with uncertainty," Proceedings of the 2nd Greek National Conference on Artificial Intelligence, 2002.
[BibTeX] [Download PDF]@article{Tsamardinos2002, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/26536c53d2233343b3131fa19bc9b81ba/mensxmachina}, const = {\ text}, interhash = {7869bcab99f80ab91b1c342618623805}, intrahash = {6536c53d2233343b3131fa19bc9b81ba}, journal = {Proceedings of the 2nd Greek National Conference on Artificial Intelligence}, keywords = {imported}, timestamp = {2018-12-23T21:03:57.000+0100}, title = {A probabilistic approach to robust execution of temporal plans with uncertainty}, url = {http://link.springer.com/chapter/10.1007/3-540-46014-4{\_}10}, year = 2002 }
2001
- I. Tsamardinos, M. Pollack, and P. Ganchev, "Flexible dispatch of disjunctive plans," Proceedings of Sixth European Conference on Planning 2001 (ECP-01), 2001.
[BibTeX] [Download PDF]@article{Tsamardinos2001, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, I and Pollack, ME and Ganchev, P}, biburl = {https://www.bibsonomy.org/bibtex/2e1d846371019741131439e53d45d6d69/mensxmachina}, const = {\ text}, interhash = {04a454d32f04f579fb9a47c1a8d505b2}, intrahash = {e1d846371019741131439e53d45d6d69}, journal = {Proceedings of Sixth European Conference on Planning 2001 (ECP-01)}, keywords = {imported}, timestamp = {2018-12-23T21:03:57.000+0100}, title = {Flexible dispatch of disjunctive plans}, url = {http://www.aaai.org/ocs/index.php/ECP/ECP01/paper/view/7769}, year = 2001 }
2000
- I. Tsamardinos, M. Pollack, and J. Horty, "Merging Plans with Quantitative Temporal Constraints, Temporally Extended Actions, and Conditional Branches.," Proceedings of the 5th International Conference on AI Planning and Scheduling (AIPS 2000), Breckenridge, CO, April, 2000, 2000.
[BibTeX] [Download PDF]@article{Tsamardinos2000, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, I and Pollack, ME and Horty, JF}, biburl = {https://www.bibsonomy.org/bibtex/26f2f38a156f693be2221d9a70671b6ce/mensxmachina}, const = {\ text}, interhash = {db61347457b16b9a9d506bbd0670ef7c}, intrahash = {6f2f38a156f693be2221d9a70671b6ce}, journal = {Proceedings of the 5th International Conference on AI Planning and Scheduling (AIPS 2000), Breckenridge, CO, April, 2000}, keywords = {imported}, timestamp = {2018-12-23T21:03:57.000+0100}, title = {Merging Plans with Quantitative Temporal Constraints, Temporally Extended Actions, and Conditional Branches.}, url = {http://www.aaai.org/Papers/AIPS/2000/AIPS00-028.pdf}, year = 2000 }
1999
- M. Pollack, I. Tsamardinos, and J. Horty, "Adjustable autonomy for a plan management agent," Proceedings of the 1999 AAAI Spring Symposium on Adjustable Autonomy, 1999.
[BibTeX] [Download PDF]@article{Pollack1999, added-at = {2018-12-23T19:41:26.000+0100}, author = {Pollack, ME and Tsamardinos, I and Horty, JF}, biburl = {https://www.bibsonomy.org/bibtex/22ce59a05f4cff48a66e0b7306a6a985b/mensxmachina}, const = {\ text}, interhash = {75a957bcb58791d35f93d88dffce3ceb}, intrahash = {2ce59a05f4cff48a66e0b7306a6a985b}, journal = {Proceedings of the 1999 AAAI Spring Symposium on Adjustable Autonomy}, keywords = {imported}, timestamp = {2018-12-23T21:03:57.000+0100}, title = {Adjustable autonomy for a plan management agent}, url = {http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.79.6259{\&}rep=rep1{\&}type=pdf}, year = 1999 }
1998
- C. Bicchieri, M. Pollack, C. Rovelli, and I. Tsamardinos, "Bicchieri-The\_Potential\_for\_the\_Evolution\_of\_Co-operatin.pdf," International Journal of Computer-Human Systems, iss. 1, 1998.
[BibTeX]@article{Bicchieri1998, added-at = {2018-12-23T19:41:26.000+0100}, author = {Bicchieri, C. and Pollack, M. and Rovelli, C. and Tsamardinos, I.}, biburl = {https://www.bibsonomy.org/bibtex/217da3e94c45d4f7ae094ab8d72a65abd/mensxmachina}, const = {\ text}, interhash = {a41f3544fbae8ab5101e2eeab94bb7d5}, intrahash = {17da3e94c45d4f7ae094ab8d72a65abd}, journal = {International Journal of Computer-Human Systems}, keywords = {imported}, number = 1, timestamp = {2018-12-23T21:03:57.000+0100}, title = {Bicchieri-The\_Potential\_for\_the\_Evolution\_of\_Co-operatin.pdf}, year = 1998 }
- I. Tsamardinos, N. Muscettola, and P. Morris, "Fast transformation of temporal plans for efficient execution," Proceedings of the 15th National Conference on Artificial Intelli-gence (AAAI'98), 1998.
[BibTeX] [Download PDF]@article{Tsamardinos1998, added-at = {2018-12-23T19:41:26.000+0100}, author = {Tsamardinos, I and Muscettola, N and Morris, P}, biburl = {https://www.bibsonomy.org/bibtex/2ba5ac529fb3d4c80c0123c8b13d0b652/mensxmachina}, const = {\ text}, interhash = {37dcee99a6de853d92a735d6228be384}, intrahash = {ba5ac529fb3d4c80c0123c8b13d0b652}, journal = {Proceedings of the 15th National Conference on Artificial Intelli-gence (AAAI'98)}, keywords = {imported}, timestamp = {2018-12-23T21:03:57.000+0100}, title = {Fast transformation of temporal plans for efficient execution}, url = {http://www.aaai.org/Papers/AAAI/1998/AAAI98-035.pdf}, year = 1998 }
- N. Muscettola, P. Morris, and I. Tsamardinos, "Reformulating temporal plans for efficient execution," Proceedings of the 6th Conference Principles of Knowledge Represen-tation and Reasoning (KR), 1998.
[BibTeX] [Download PDF]@article{Muscettola1998, added-at = {2018-12-23T19:41:26.000+0100}, author = {Muscettola, N and Morris, P and Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/24cd59e36717b5bc213ee5a1591977310/mensxmachina}, const = {\ text}, interhash = {343763a35e2fb30b12932b15849799d2}, intrahash = {4cd59e36717b5bc213ee5a1591977310}, journal = {Proceedings of the 6th Conference Principles of Knowledge Represen-tation and Reasoning (KR)}, keywords = {imported}, timestamp = {2018-12-23T21:03:57.000+0100}, title = {Reformulating temporal plans for efficient execution}, url = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.8035}, year = 1998 }
1995
- S. Orphanoudakis, M. Tsiknakis, C. Chronaki, S. Kostomanolakis, M. Zikos, and I. Tsamardinos, "Development of an Integrated Image Management and Communication System on Crete," Proceedings of Computed Aided Radiology '95, 1995.
[BibTeX]@article{Orphanoudakis1995, added-at = {2018-12-23T19:41:26.000+0100}, author = {Orphanoudakis, S. and Tsiknakis, M. and Chronaki, C. and Kostomanolakis, S. and Zikos, M. and Tsamardinos, I}, biburl = {https://www.bibsonomy.org/bibtex/2e77c876488c8fc523e47e3b817116d69/mensxmachina}, const = {\ text}, interhash = {d4e3d4d0d8cb672f3927064bac615341}, intrahash = {e77c876488c8fc523e47e3b817116d69}, journal = {Proceedings of Computed Aided Radiology '95}, keywords = {imported}, timestamp = {2018-12-23T21:03:57.000+0100}, title = {Development of an Integrated Image Management and Communication System on Crete}, year = 1995 }
About Us
Mens Ex Machina, Mind from the Machine or “Ο από Μηχανής Νους” paraphrases the latin expression Deus Ex Machina, God from the Machine. The name was suggested by Lucy Sofiadou, Prof. Tsamardinos’ wife.
We are a research group, founded in October 2006, led by Professor Ioannis Tsamardinos, interested in Artificial Intelligence, Machine Learning, and Biomedical Informatics and affiliated with the Computer Science Department of University of Crete. The aims of the group are to progress science and disseminate knowledge via educational activities and computer tools. Our group is involved in
Research:
Theoretical, algorithmic, and applied research in all of the above areas; we are also involved in interdisciplinary collaborations with biologists, physicians and practitioners from other fields.
Education:
Educational activities, such as teaching university courses, tutorials, summers schools, as well as supervising undergraduate dissertations, masters projects, and Ph.D. theses.
Systems and Software:
Implementation of tools, systems, and code libraries to aid the dissemination of the research results. Funding is provided from and through the University of Crete, often originating from European and International research grants.
Current research activities include but not limited to the following:
- Causal discovery methods and the induction of causal models from observational studies. Specifically, we have recently introduced the problem of Integrative Causal Analysis (INCA).
- Feature selection (a.k.a. variable selection) for classification and regression.
- Induction of graphical models, such as Bayesian Networks from data.
- Analysis of biomedical data and applications of AI and Machine Learning methods to induce new biomedical knowledge.
- Activity recognition in Ambient Intelligent environments.
Ioannis Tsamardinos
Professor, Department of Computer Science, University of Crete