This tutorial explains how PlaNet was used to arrive at the conclusions presented in our "Comparative phylogenomic analysis of gene co-expression networks reveals evolution of functional modules in plants" paper. For brevity, we only use the photosynthesis example, as the results for the cell walls can be obtained by following similar steps. We assume that you have familiarized yourself with how to use PlaNet. If this is not the case, please visit the Help page. If you are still stuck, please write the corresponding author for further help.
Gene pages of Physcomitrella patens - exemplified by Pp1s185_110v6.1, a photosytem II subunit PsbW protein
By entering and selecting the Pp1s185_110v6.1 gene identifier on the Home page, you will be redirected to the gene page. The gene page contains several items, which are:
1. Annotation of the gene. The elements are:
Gene pages of Physcomitrella patens - exemplified by Pp1s185_110v6.1, a photosytem II subunit PsbW protein
By entering and selecting the Pp1s185_110v6.1 gene identifier on the Home page, you will be redirected to the gene page. The gene page contains several items, which are:
1. Annotation of the gene. The elements are:
- source species of the gene
- microarray probeset ID
- gene ID
- short annotation
- the co-expression cluster the gene belongs to
- Pfam domains (PsbW)
- PLAZA gene families (HOM003641)
- sub-families (ORTHO009355) of the gene
2. Expression profile of the gene Pp1s185_110v6.1
3. The co-expression network of the Pp1s185_110v6.1 is based on second neighborhood, which is generated by collecting all genes that are two steps away from the Pp1s185_110v6.1 (large node in the center).
The networks are interactive and the nodes can be moved by dragging them.
The networks are interactive and the nodes can be moved by dragging them.
The network is quite large, so let's make it smaller and more readable. This can be accomplished by the right-click menu, where you have two options to do this. First, you can click on the "Toggle second level neighborhood". This option will show the first neighborhood only, i.e. nodes that are directly connected to Pp1s185_110v6.1.
Much better! The network can be further reduced by clicking on "Toggle nodes supported by ELA". This option removes genes that are not found to be associated with PsbW domain genes in multiple species (the eight angiosperms present in PlaNet).
4. Description of label co-occurrences.
The next item explains which gene families and Pfam domains are associated with the label co-occurrences found in the second neighborhood.
The next item explains which gene families and Pfam domains are associated with the label co-occurrences found in the second neighborhood.
There are lots of them, since the co-expression network of the second neighborhood is large, and contains a large number of different gene families and Pfam domains. Lets zoom in:
5. Text-based gene composition of the second neighborhood of Pp1s185_110v6.1.
The table provides probeset IDs, gene IDs, annotations and which labels, i.e. Pfam domains and gene families are associated with the genes.
The table provides probeset IDs, gene IDs, annotations and which labels, i.e. Pfam domains and gene families are associated with the genes.
6. Table containing GO term enrichment of the genes in the second neighborhood.
The table can tell you the biological process (BP, i.e. what biological process the neighborhood is involved in), cellular component (CC, i.e. where in the cell the neighborhood acts) and molecular function (MF, i.e. which enzymazic/structural function the neighborhood has).
The table can tell you the biological process (BP, i.e. what biological process the neighborhood is involved in), cellular component (CC, i.e. where in the cell the neighborhood acts) and molecular function (MF, i.e. which enzymazic/structural function the neighborhood has).
7. Gene module network, which shows second neighborhoods that are similar to the query neighborhood of Pp1s185_110v6.1 (central node).
Please see Help for a more detailed explanation of the gene module network. The similar neighborhoods that are found in other species than the query are called conserved modules (connected to the query by blue edges), while similar neighborhoods found within the same species are called duplicated modules (connected to the query by green edges). Since photosynthetic module is not duplicated in Physcomitrella, the network does not show any duplicated modules. The network indicates that there are conserved modules in all species in PlaNet, which is not surprising, since photosynthesis is taking place in all plants. The modules that are connected by orange edges are overlapping modules. These overlapping modules represent genes that are co-expressed (i.e. they can be found in each others co-expression networks).
Please see Help for a more detailed explanation of the gene module network. The similar neighborhoods that are found in other species than the query are called conserved modules (connected to the query by blue edges), while similar neighborhoods found within the same species are called duplicated modules (connected to the query by green edges). Since photosynthetic module is not duplicated in Physcomitrella, the network does not show any duplicated modules. The network indicates that there are conserved modules in all species in PlaNet, which is not surprising, since photosynthesis is taking place in all plants. The modules that are connected by orange edges are overlapping modules. These overlapping modules represent genes that are co-expressed (i.e. they can be found in each others co-expression networks).
8. The final item shows a text representation of the gene module network, and allows selecting gene modules for further analysis.
The selection can be done by clicking on the check boxes that are found in the "Probeset/other ID" column. Below the table, you can select further options that will influence the size of the modules. Notice that there are sometimes multiple genes in this table. These genes are grouped due to overlap (orange edges in the module network). You can select one or multiple of the overlapping genes. We prefer to select one representative gene with the highest label co-occurrence score.
Further options for module comparison can be toggled by clicking on the check boxes below the table.
The selection can be done by clicking on the check boxes that are found in the "Probeset/other ID" column. Below the table, you can select further options that will influence the size of the modules. Notice that there are sometimes multiple genes in this table. These genes are grouped due to overlap (orange edges in the module network). You can select one or multiple of the overlapping genes. We prefer to select one representative gene with the highest label co-occurrence score.
Further options for module comparison can be toggled by clicking on the check boxes below the table.
- The first check box tells PlaNet to loot at first neighborhoods only, which will make the gene modules smaller. PlaNet uses second neighborhoods per default.
- The second check box will show all genes in the gene modules, which will make the modules much larger. PlaNet shows only label co-occurrences that are present in at least two modules per default.
- The third check box will use the ELA filter to remove genes that are not supported by ELA, which will make the modules smaller. PlaNet does not use ELA filter to show gene module contents per default.
After you are happy with you selection, click on the Compare button. For the analysis shown on Figure 3 (in the manuscript), we have selected the most similar gene modules in Arabidopsis, rice and the query gene.
9. Analysis of selected modules.
The first item on the page explains the contents of the gene module content networks. Modules are shown as boxes (two modules are shown), while genes found in the modules are visualized as nodes. Edges between modules are used to convey when given genes have speciated/duplicated.
9. Analysis of selected modules.
The first item on the page explains the contents of the gene module content networks. Modules are shown as boxes (two modules are shown), while genes found in the modules are visualized as nodes. Edges between modules are used to convey when given genes have speciated/duplicated.
10. The gene module content network, showing gene contents of the three photosynthetic modules selected in previous step.
This is the network we used to generate Figure 7A in the paper. This network is not as good looking as the one in the paper, as the layout algorithm is sometimes struggling with placing the nodes in the optimal position. This had to be done manually by us, and then exported as PDF. To make the Figure 7A more readable, we have removed the gene identifiers, but you can see them on PlaNet. Note that the phylostratigraphic enrichment is shown next to the modules.
This is the network we used to generate Figure 7A in the paper. This network is not as good looking as the one in the paper, as the layout algorithm is sometimes struggling with placing the nodes in the optimal position. This had to be done manually by us, and then exported as PDF. To make the Figure 7A more readable, we have removed the gene identifiers, but you can see them on PlaNet. Note that the phylostratigraphic enrichment is shown next to the modules.
11. The phylogenetic and phylostratigraphic information is visualized as colored edges connecting the modules and node border colors, respectively (see below).
For example, genes connected by red dashed edges are related to one another by land plant speciation event. In this example, genes between Physcomitrella and Arabidopsis show land plant speciation events (i.e. split of bryophytes and vascular plants). Phylostratigraphic information is indicated by node border colors. For example, green borders indicate that a gene belongs to Green Plant phylostratum.
For example, genes connected by red dashed edges are related to one another by land plant speciation event. In this example, genes between Physcomitrella and Arabidopsis show land plant speciation events (i.e. split of bryophytes and vascular plants). Phylostratigraphic information is indicated by node border colors. For example, green borders indicate that a gene belongs to Green Plant phylostratum.
12. Description of label co-occurrences and simplified view of the modules.
The next item shows a legend describing labels (gene families and Pfam domains) found in the modules. Furthermore, a simplified view of the modules indicates which labels can be found in each module. This image was used to generate the evolutionary model show in Figure 7C.
The next item shows a legend describing labels (gene families and Pfam domains) found in the modules. Furthermore, a simplified view of the modules indicates which labels can be found in each module. This image was used to generate the evolutionary model show in Figure 7C.
13. Table showing enrichment of duplication/speciation events found between the modules.
The table can tell you (i) the type (speciation/duplication), (ii) phylostratum (Green Plants, Land Plants,.. ), (iii) number of times the phylostratum was observed between the two modules and (iv) the P-value that indicates whether a given phylostratum is enriched (for 0.0<=P<0.05) or depleted (-0.0<=P<-0.05). Here, modules 8 and 1 (rice and Arabidopsis) show a significant enrichment of angiosperm speciation edges, while modules 8 and query (rice and Physcomitrella) show significant land plant speciation. Note that the table uses same color and line style to indicate the event (i.e. dashed red edge = speciation in land plants).
The table can tell you (i) the type (speciation/duplication), (ii) phylostratum (Green Plants, Land Plants,.. ), (iii) number of times the phylostratum was observed between the two modules and (iv) the P-value that indicates whether a given phylostratum is enriched (for 0.0<=P<0.05) or depleted (-0.0<=P<-0.05). Here, modules 8 and 1 (rice and Arabidopsis) show a significant enrichment of angiosperm speciation edges, while modules 8 and query (rice and Physcomitrella) show significant land plant speciation. Note that the table uses same color and line style to indicate the event (i.e. dashed red edge = speciation in land plants).
14. Next table shows you which phylostrata are enriched in the three gene modules.
The table lists (i) the module ID, (ii) phylostratum, (iii) number of times a phylostratum was found in a module and (iv) the P-value of the enrichment/depletion. Similarly to the above table, 0.0<=P<0.05 indicates enrichment, while -0.0<=P<-0.05 indicates depletion. So, for the Physcomitrella module, we observe enrichment of Green Plant phylostratum, but a depletion of Physcomitrella-specific phylostratum. The enrichment of Green Plant phylostratum are in line with the ancient origin of photosynthesis. The depletion of younger phylostrata suggests that photosynthesis has not been "improved" by addition of new genes in Physcomitrella.
The table lists (i) the module ID, (ii) phylostratum, (iii) number of times a phylostratum was found in a module and (iv) the P-value of the enrichment/depletion. Similarly to the above table, 0.0<=P<0.05 indicates enrichment, while -0.0<=P<-0.05 indicates depletion. So, for the Physcomitrella module, we observe enrichment of Green Plant phylostratum, but a depletion of Physcomitrella-specific phylostratum. The enrichment of Green Plant phylostratum are in line with the ancient origin of photosynthesis. The depletion of younger phylostrata suggests that photosynthesis has not been "improved" by addition of new genes in Physcomitrella.
Based on these results, we can now estimate the evolution of photosynthesis, since we know (i) the identity of genes and label co-occurrences, (ii) the speciation/duplication events between modules and (iii) phylostratigraphic enrichment of the modules. Taken together, we can propose the model shown on Figure 7C:
15. Below, you can see the expression profile of the module centers, that is, the genes that are the query genes for the modules.
Here, we exemplify the expression of Arabidopsis and Physcomitrella PsbW genes.
Here, we exemplify the expression of Arabidopsis and Physcomitrella PsbW genes.
16. The last two items are (i) a table showing which genes are present in the modules (not shown) and (ii) which GO slim (simplified GO terms) are enriched in the modules.
The enrichment analysis can reveal a biological function of the modules. In this case, modules 1 and 8 contain genes that are involved in photosynthesis (BP = biological process) and localized to thylakoid (CC = cellular component).
The enrichment analysis can reveal a biological function of the modules. In this case, modules 1 and 8 contain genes that are involved in photosynthesis (BP = biological process) and localized to thylakoid (CC = cellular component).