Interpretation of results from trees is usually straightforward. In Fig. 14.1, we identified genes IL-8, CANX, and RAB3B whose expression levels are highly predictive of colon cancer. However, this does not necessarily mean that these genes cause colon cancer. Such a conclusion requires a thorough search of the literature and further experiments. For example, after reviewing the literature, [81] found evidence that associates IL-8 with the stage of colon cancer ([30]), the migration of human clonic epithelial cell lines ([70]), and metastasis of bladder cancer ([42]). In addition, the expression of the molecular chaperone CANX was found to be down-regulated in HT-29 human colon adenocarcinoma cells ([72]) and to be involved in apoptosis in human prostate epithelial tumor cells ([56]). Lastly, RAB3B is a member of the RAS oncogene family. Therefore, these existing studies provide independent support that the three genes identified in Fig. 14.1 may be in the pathways of colon cancer. If this hypothesis could be confirmed from further experiments, Fig. 14.1 would have another important implication. Pathologically speaking, the colon cancer samples are indistinguishable. Figure 14.1 indicates that those samples are not homogeneous in terms of gene expression levels. If confirmed, such a finding could be useful in cancer diagnosis and treatment.
As we stated earlier, there are numerous applications of decision trees in biomedical research, including the example above. To have a glimpse of the diverse applications of decision trees, let us review two different examples.
|
We can see from Fig. 14.2 that the risk of bankruptcy is relatively high if the ratio of cash flow to total debt is below 0.1309, unless both the ratio of retained earnings to total assets and the ratio of cash to total sales are above certain levels, i.e., 0.1453 and 0.025, respectively. Even if the ratio of cash flow to total debt is above 0.1309, there can be elevated risk of bankruptcy if the ratio of total debt to total assets is high (above 0.6975). A tree diagram as in Fig. 14.2 offers a very clear and simple assessment of the financial state of a company.
|
Figure 14.3 presents part of the regression tree that is constructed by [15]. We trimmed the left hand side to fit into the space here; however, we can get the idea from the right hand side of tree. Generally speaking, a node of size or such as nodes and is too small to be reliable. Since we do not have the data to re-grow the tree, let us pretend that the node sizes are adequate, and concentrate on the interpretation instead. Since the main objective of Chen et al. appears to identify active nodes (i.e., those with high potencies), a small, inactive node is not of great concern.
First, there is one highly active node (node with potency greater than ) in Fig. 14.3. There are also two highly active nodes on the left hand side which are not shown in Fig. 14.3. Supported by the literature, [15] postulated that there might be different mechanisms of action because the active nodes contain compounds of very different characteristics. This is similar to the hypothesis suggested by Fig. 14.1 that the colon cancer tissues might be biologically heterogeneous. Chen et al. concluded further that their tree demonstrates the ability to detect multiple mechanisms of action coexisting in a large three-dimensional chemical data set. In addition, the selected atom pair descriptors also reveal interesting features of the monoamine oxidase (MAO) inhibitors. For instance, the ''aromatic ring center-triple bond center'' pair in the first split is the structural characteristic of pargyline, a well known MAO inhibitor.
We can see from these examples that tree-based methods tend to unravel integrated, intuitive results whose pieces are consistent with prior findings. Not only can we use trees for prediction, but also we may use them to identify potentially important mechanisms or pathways for further investigation.