R/calc_rowscore.R
calc_rowscore.Rd
Calculate row-wise scores of a given binary feature set based on a given scoring method
a matrix of binary features where rows represent features of interest (e.g. genes, transcripts, exons, etc...) and columns represent the samples.
a vector of continuous scores representing a phenotypic readout of interest such as protein expression, pathway activity, etc.
NOTE: input_score
object must have names or labels that match the column
names of FS_mat
object.
a character string specifies a scoring method that is
used in the search. There are 6 options: ("ks_pval"
or ks_score
or "wilcox_pval"
or wilcox_score
or
"revealer"
(conditional mutual information from REVEALER) or
"custom"
(a customized scoring method)).
Default is ks_pval
.
if method is "custom"
, specifies
the name of the customized function here. Default is NULL
.
NOTE: custom_function() must take FS_mat
and input_score
as its input arguments, and its final result must return a vector of row-wise
scores ordered from most significant to least significant where its labels or
names matched the row names of FS_mat
object.
if method is "custom"
, specifies a list of
additional arguments (excluding FS_mat
and input_score
) to be
passed to custom_function
. Default is NULL
.
a character string specifies an alternative hypothesis
testing ("two.sided"
or "greater"
or "less"
).
Default is less
for left-skewed significance testing.
NOTE: This argument is applied to KS and Wilcoxon method
if method is ks_score
or ks_pval
, specifying a
vector of weights will perform a weighted-KS testing. Default is NULL
.
a vector of one or more features representing known “causes”
of activation or features associated with a response of interest.
It is applied for method = "revealer"
only.
a logical value indicates whether or not to validate if the
given parameters (FS_mat
and input_score
) are valid inputs.
Default is TRUE
.
a logical value indicates whether or not to print the
diagnostic messages. Default is FALSE
.
additional parameters to be passed to custom_function
return a vector of row-wise scores where it is ordered from most
significant to least significant (e.g. from highest to lowest values)
where its labels or names must match the row names of FS_mat
object
# Create a feature matrix
mat <- matrix(c(1,0,1,0,0,0,0,0,1,0,
0,0,1,0,1,0,1,0,0,0,
0,0,0,0,1,0,1,0,1,0), nrow=3)
colnames(mat) <- 1:10
row.names(mat) <- c("TP_1", "TP_2", "TP_3")
# Create a vector of observed input scores
set.seed(42)
input_score = rnorm(n = ncol(mat))
names(input_score) <- colnames(mat)
# Run the ks method
ks_rowscore_result <- calc_rowscore(
FS_mat = mat,
input_score = input_score,
method = "ks_pval",
weight = NULL,
alternative = "less"
)
# Run the wilcoxon method
wilcox_rowscore_result <- calc_rowscore(
FS_mat = mat,
input_score = input_score,
method = "wilcox_pval",
alternative = "less"
)
# Run the revealer method
revealer_rowscore_result <- calc_rowscore(
FS_mat = mat,
input_score = input_score,
method = "revealer",
seed_names = NULL
)
# A customized function using ks-test function
customized_rowscore <- function(FS_mat, input_score, alternative="less"){
ks <- apply(FS_mat, 1, function(r){
x = input_score[which(r==1)];
y = input_score[which(r==0)];
res <- ks.test(x, y, alternative=alternative)
return(c(res$statistic, res$p.value))
})
# Obtain score statistics and p-values from KS method
stat <- ks[1,]
pval <- ks[2,]
# Compute the -log scores for pval
# Make sure scores has names that match the row names of FS_mat object
scores <- -log(pval)
names(scores) <- rownames(FS_mat)
# Remove scores that are Inf as it is resulted from
# taking the -log(0). They are uninformative.
scores <- scores[scores != Inf]
# Re-order FS_mat in a decreasing order (from most to least significant)
# This comes in handy when doing the top-N evaluation of
# the top N 'best' features
scores <- scores[order(scores, decreasing=TRUE)]
return(scores)
}
# Search for best features using a custom-defined function
custom_rowscore_result <- calc_rowscore(
FS_mat = mat,
input_score = input_score,
method = "custom",
custom_function = customized_rowscore,
custom_parameters = NULL
)