Research

Overview

With a lifelong interest in computational biology, I have been focusing more particularly on systems biology since my master's degree. I am interested in solving combinatorial problems using Answer Set Programming (ASP). I discovered this ASP paradigm during my master's internship when I was trying to solve a reachability problem in a graph. Currently, I used this paradigm to model human preimplantation development, the subject of my Ph.D. thesis, through the inference of Boolean networks. I want to continue to develop methods to analyze complex human biological systems.

I am currently looking for a postdoc. If you know of any exciting opportunities, feel free to reach out!

Publications

2024

  1. Boolean Network Models of Human Preimplantation Development.
    M. Bolteau, L. Chebouba, L. David, J. Bourdon, & C. Guziolowski. (2024). Journal of Computational Biology. doi:10.1089/cmb.2024.0517
    Abstract
    Single-cell transcriptomic studies of differentiating systems allow meaningful understanding, especially in human embryonic development and cell fate determination. We present an innovative method aimed at modeling these intricate processes by leveraging scRNAseq data from various human developmental stages. Our implemented method identifies pseudo-perturbations, since actual perturbations are unavailable due to ethical and technical constraints. By integrating these pseudo-perturbations with prior knowledge of gene interactions, our framework generates stage-specific Boolean networks (BNs). We apply our method to medium and late trophectoderm developmental stages and identify 20 pseudo-perturbations required to infer BNs. The resulting BN families delineate distinct regulatory mechanisms, enabling the differentiation between these developmental stages. We show that our program outperforms existing pseudo-perturbation identification tool. Our framework contributes to comprehending human developmental processes and holds potential applicability to diverse developmental stages and other research scenarios.
    BibTeX
    @article{Bolteau2024,
      author = {Bolteau, Mathieu and Chebouba, Lokmane and David, Laurent and Bourdon, Jérémie and Guziolowski, Carito},
      journal = {Journal of Computational Biology},
      title = {Boolean {Network} {Models} of {Human} {Preimplantation} {Development}},
      year = {2024},
      month = may,
      abstract = {Single-cell transcriptomic studies of differentiating systems allow meaningful understanding, especially in human embryonic development and cell fate determination. We present an innovative method aimed at modeling these intricate processes by leveraging scRNAseq data from various human developmental stages. Our implemented method identifies pseudo-perturbations, since actual perturbations are unavailable due to ethical and technical constraints. By integrating these pseudo-perturbations with prior knowledge of gene interactions, our framework generates stage-specific Boolean networks (BNs). We apply our method to medium and late trophectoderm developmental stages and identify 20 pseudo-perturbations required to infer BNs. The resulting BN families delineate distinct regulatory mechanisms, enabling the differentiation between these developmental stages. We show that our program outperforms existing pseudo-perturbation identification tool. Our framework contributes to comprehending human developmental processes and holds potential applicability to diverse developmental stages and other research scenarios.},
      doi = {10.1089/cmb.2024.0517},
      file = {Full Text PDF:https\://www.liebertpub.com/doi/pdf/10.1089/cmb.2024.0517:application/pdf},
      publisher = {Mary Ann Liebert, Inc., publishers},
      url = {https://www.liebertpub.com/doi/abs/10.1089/cmb.2024.0517},
      urldate = {2024-06-03},
      preprint = {https://hal.science/hal-04579386},
      month_numeric = {5}
    }
    
    DOI PrePrint

  2. Seed2LP: seed inference in metabolic networks for reverse ecology applications.
    C. Ghassemi Nedjad, M. Bolteau, L. Bourneuf, Paulevé Loı̈c, & C. Frioux. (2024). BioRxiv. doi:10.1101/2024.09.26.615309
    Abstract
    A challenging problem in microbiology is to determine nutritional requirements of microorganisms and culture them, especially for the microbial dark matter detected solely with culture-independent methods. The latter foster an increasing amount of genomic sequences that can be explored with reverse ecology approaches to raise hypotheses on the corresponding populations. Building upon genome scale metabolic networks (GSMNs) obtained from genome annotations, metabolic models predict contextualised phenotypes using nutrient information. We developed the tool Seed2LP, addressing the inverse problem of predicting source nutrients, or seeds, from a GSMN and a metabolic objective. The originality of Seed2LP is its hybrid model, combining a scalable and discrete Boolean approximation of metabolic activity, with the numerically accurate flux balance analysis (FBA). Seed inference is highly customisable, with multiple search and solving modes, exploring the search space of external and internal metabolites combinations. Application to a benchmark of 107 curated GSMNs highlights the usefulness of a logic modelling method over a graph-based approach to predict seeds, and the relevance of hybrid solving to satisfy FBA constraints. Focusing on the dependency between metabolism and environment, Seed2LP is a computational support contributing to address the multifactorial challenge of culturing possibly uncultured microorganisms. Seed2LP is available on https://github.com/bioasp/seed2lp.Competing Interest StatementThe authors have declared no competing interest.
    BibTeX
    @article{Ghassemi2024,
      author = {Ghassemi Nedjad, Chabname and Bolteau, Mathieu and Bourneuf, Lucas and Paulev{\'e}, Lo{\"\i}c and Frioux, Cl{\'e}mence},
      title = {Seed2LP: seed inference in metabolic networks for reverse ecology applications},
      elocation-id = {2024.09.26.615309},
      year = {2024},
      doi = {10.1101/2024.09.26.615309},
      publisher = {Cold Spring Harbor Laboratory},
      abstract = {A challenging problem in microbiology is to determine nutritional requirements of microorganisms and culture them, especially for the microbial dark matter detected solely with culture-independent methods. The latter foster an increasing amount of genomic sequences that can be explored with reverse ecology approaches to raise hypotheses on the corresponding populations. Building upon genome scale metabolic networks (GSMNs) obtained from genome annotations, metabolic models predict contextualised phenotypes using nutrient information. We developed the tool Seed2LP, addressing the inverse problem of predicting source nutrients, or seeds, from a GSMN and a metabolic objective. The originality of Seed2LP is its hybrid model, combining a scalable and discrete Boolean approximation of metabolic activity, with the numerically accurate flux balance analysis (FBA). Seed inference is highly customisable, with multiple search and solving modes, exploring the search space of external and internal metabolites combinations. Application to a benchmark of 107 curated GSMNs highlights the usefulness of a logic modelling method over a graph-based approach to predict seeds, and the relevance of hybrid solving to satisfy FBA constraints. Focusing on the dependency between metabolism and environment, Seed2LP is a computational support contributing to address the multifactorial challenge of culturing possibly uncultured microorganisms. Seed2LP is available on https://github.com/bioasp/seed2lp.Competing Interest StatementThe authors have declared no competing interest.},
      url = {https://www.biorxiv.org/content/early/2024/09/27/2024.09.26.615309},
      eprint = {https://www.biorxiv.org/content/early/2024/09/27/2024.09.26.615309.full.pdf},
      journal = {bioRxiv},
      preprint = {https://www.biorxiv.org/content/early/2024/09/27/2024.09.26.615309}
    }
    
    DOI PrePrint

  3. Logic programs to infer computational models of human embryonic development.
    M. Bolteau. (2024).
    BibTeX
    @phdthesis{Bolteau2024_phd,
      title = {Logic programs to infer computational models of human embryonic development},
      author = {Bolteau, Mathieu},
      year = {2024},
      note = {s389270},
      url = {}
    }
    

2023

  1. Inferring Boolean Networks from Single-Cell Human Embryo Datasets.
    M. Bolteau, J. Bourdon, L. David, & C. Guziolowski. (2023). In Bioinformatics Research and Applications , X. Guo, S. Mangul, M. Patterson, & A. Zelikovsky (Eds.). doi:10.1007/978-981-99-7074-2_34
    Abstract
    This study aims to understand human embryonic development and cell fate determination, specifically in relation to trophectoderm (TE) maturation. We utilize single-cell transcriptomics (scRNAseq) data to develop a framework for inferring computational models that distinguish between two developmental stages. Our method selects pseudo-perturbations from scRNAseq data since actual perturbations are impractical due to ethical and legal constraints. These pseudo-perturbations consist of input-output discretized expressions, for a limited set of genes and cells. By combining these pseudo-perturbations with prior-regulatory networks, we can infer Boolean networks that accurately align with scRNAseq data for each developmental stage. Our publicly available method was tested with several benchmarks, proving the feasibility of our approach. Applied to the real dataset, we infer Boolean network families, corresponding to the medium and late TE developmental stages. Their structures reveal contrasting regulatory pathways, offering valuable biological insights and hypotheses within this domain.
    BibTeX
    @inproceedings{Bolteau2023,
      author = {Bolteau, Mathieu and Bourdon, Jérémie and David, Laurent and Guziolowski, Carito},
      booktitle = {Bioinformatics {Research} and {Applications}},
      title = {Inferring {Boolean} {Networks} from {Single}-{Cell} {Human} {Embryo} {Datasets}},
      year = {2023},
      address = {Singapore},
      editor = {Guo, Xuan and Mangul, Serghei and Patterson, Murray and Zelikovsky, Alexander},
      pages = {431--441},
      publisher = {Springer Nature},
      series = {Lecture {Notes} in {Computer} {Science}},
      abstract = {This study aims to understand human embryonic development and cell fate determination, specifically in relation to trophectoderm (TE) maturation. We utilize single-cell transcriptomics (scRNAseq) data to develop a framework for inferring computational models that distinguish between two developmental stages. Our method selects pseudo-perturbations from scRNAseq data since actual perturbations are impractical due to ethical and legal constraints. These pseudo-perturbations consist of input-output discretized expressions, for a limited set of genes and cells. By combining these pseudo-perturbations with prior-regulatory networks, we can infer Boolean networks that accurately align with scRNAseq data for each developmental stage. Our publicly available method was tested with several benchmarks, proving the feasibility of our approach. Applied to the real dataset, we infer Boolean network families, corresponding to the medium and late TE developmental stages. Their structures reveal contrasting regulatory pathways, offering valuable biological insights and hypotheses within this domain.},
      doi = {10.1007/978-981-99-7074-2_34},
      file = {Full Text PDF:https\://link.springer.com/content/pdf/10.1007%2F978-981-99-7074-2_34.pdf:application/pdf},
      isbn = {9789819970742},
      keywords = {Boolean networks, Answer Set Programming, Human preimplantation development, scRNAseq modeling},
      language = {en},
      preprint = {https://hal.science/hal-04206397}
    }
    
    DOI PrePrint

  2. Predicting weighted unobserved nodes in a regulatory network using answer set programming.
    S. Le Bars, M. Bolteau, J. Bourdon, & C. Guziolowski. (2023). BMC Bioinformatics. doi:10.1186/s12859-023-05429-3
    Abstract
    The impact of a perturbation, over-expression, or repression of a key node on an organism, can be modelled based on a regulatory and/or metabolic network. Integration of these two networks could improve our global understanding of biological mechanisms triggered by a perturbation. This study focuses on improving the modelling of the regulatory network to facilitate a possible integration with the metabolic network. Previously proposed methods that study this problem fail to deal with a real-size regulatory network, computing predictions sensitive to perturbation and quantifying the predicted species behaviour more finely.
    BibTeX
    @article{LeBars2023,
      author = {Le Bars, Sophie and Bolteau, Mathieu and Bourdon, Jérémie and Guziolowski, Carito},
      journal = {BMC Bioinformatics},
      title = {Predicting weighted unobserved nodes in a regulatory network using answer set programming},
      year = {2023},
      issn = {1471-2105},
      number = {1},
      pages = {321},
      volume = {24},
      abstract = {The impact of a perturbation, over-expression, or repression of a key node on an organism, can be modelled based on a regulatory and/or metabolic network. Integration of these two networks could improve our global understanding of biological mechanisms triggered by a perturbation. This study focuses on improving the modelling of the regulatory network to facilitate a possible integration with the metabolic network. Previously proposed methods that study this problem fail to deal with a real-size regulatory network, computing predictions sensitive to perturbation and quantifying the predicted species behaviour more finely.},
      doi = {10.1186/s12859-023-05429-3},
      refid = {Le Bars2023},
      url = {https://doi.org/10.1186/s12859-023-05429-3},
      preprint = {https://hal.science/hal-04047587v1}
    }
    
    DOI PrePrint

2021

  1. The SSV-Seq 2.0 PCR-Free Method Improves the Sequencing of Adeno-Associated Viral Vector Genomes Containing GC-Rich Regions and Homopolymers.
    E. Lecomte, S. Saleun, M. Bolteau, A. Guy-Duché, O. Adjali, V. Blouin, M. Penaud-Budloo, & E. Ayuso. (2021). Biotechnology Journal. doi:https://doi.org/10.1002/biot.202000016
    Abstract
    Abstract Adeno-associated viral vectors (AAV) are efficient engineered tools for delivering genetic material into host cells. The commercialization of AAV-based drugs must be accompanied by the development of appropriate quality control (QC) assays. Given the potential risk of co-transfer of oncogenic or immunogenic sequences with therapeutic vectors, accurate methods to assess the level of residual DNA in AAV vector stocks are particularly important. An assay based on high-throughput sequencing (HTS) to identify and quantify DNA species in recombinant AAV batches is developed. Here, it is shown that PCR amplification of regions that have a local GC content >90% and include successive mononucleotide stretches, such as the CAG promoter, can introduce bias during DNA library preparation, leading to drops in sequencing coverage. To circumvent this problem, SSV-Seq 2.0, a PCR-free protocol for sequencing AAV vector genomes containing such sequences, is developed. The PCR-free protocol improves the evenness of the rAAV genome coverage and consequently leads to a more accurate relative quantification of residual DNA. HTS-based assays provide a more comprehensive assessment of DNA impurities and AAV vector genome integrity than conventional QC tests based on real-time PCR and are useful methods to improve the safety and efficacy of these viral vectors.
    BibTeX
    @article{Lecomte2021,
      author = {Lecomte, Emilie and Saleun, Sylvie and Bolteau, Mathieu and Guy-Duché, Aurélien and Adjali, Oumeya and Blouin, Véronique and Penaud-Budloo, Magalie and Ayuso, Eduard},
      title = {The SSV-Seq 2.0 PCR-Free Method Improves the Sequencing of Adeno-Associated Viral Vector Genomes Containing GC-Rich Regions and Homopolymers},
      journal = {Biotechnology Journal},
      volume = {16},
      number = {1},
      pages = {2000016},
      keywords = {AAV vectors, GC-content, high-throughput sequencing, homopolymers, PCR-free library},
      doi = {https://doi.org/10.1002/biot.202000016},
      url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/biot.202000016},
      eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1002/biot.202000016},
      abstract = {Abstract Adeno-associated viral vectors (AAV) are efficient engineered tools for delivering genetic material into host cells. The commercialization of AAV-based drugs must be accompanied by the development of appropriate quality control (QC) assays. Given the potential risk of co-transfer of oncogenic or immunogenic sequences with therapeutic vectors, accurate methods to assess the level of residual DNA in AAV vector stocks are particularly important. An assay based on high-throughput sequencing (HTS) to identify and quantify DNA species in recombinant AAV batches is developed. Here, it is shown that PCR amplification of regions that have a local GC content >90\% and include successive mononucleotide stretches, such as the CAG promoter, can introduce bias during DNA library preparation, leading to drops in sequencing coverage. To circumvent this problem, SSV-Seq 2.0, a PCR-free protocol for sequencing AAV vector genomes containing such sequences, is developed. The PCR-free protocol improves the evenness of the rAAV genome coverage and consequently leads to a more accurate relative quantification of residual DNA. HTS-based assays provide a more comprehensive assessment of DNA impurities and AAV vector genome integrity than conventional QC tests based on real-time PCR and are useful methods to improve the safety and efficacy of these viral vectors.},
      year = {2021}
    }
    
    DOI