Candidate Search — candidate

Performs heuristic search on a set of binary features to determine whether there are features whose union is more skewed (enriched at the extremes) than either features alone. This is the main functionality of the CaDrA package.

candidate_search(
  FS,
  input_score,
  method = c("ks_pval", "ks_score", "wilcox_pval", "wilcox_score", "revealer", "custom"),
  custom_function = NULL,
  custom_parameters = NULL,
  alternative = c("less", "greater", "two.sided"),
  weight = NULL,
  search_start = NULL,
  top_N = 1,
  search_method = c("both", "forward"),
  max_size = 7,
  best_score_only = FALSE,
  do_plot = FALSE,
  do_check = TRUE,
  verbose = FALSE
)

Arguments

FS

a SummarizedExperiment class object from SummarizedExperiment package where rows represent features of interest (e.g. genes, transcripts, exons, etc.) and columns represent the samples. The assay of FS contains binary (1/0) values indicating the presence/absence of omics features.

input_score

a vector of continuous scores representing a phenotypic readout of interest such as protein expression, pathway activity, etc.

NOTE: input_score object must have names or labels that match the column names of FS object.

method

a character string specifies a scoring method that is used in the search. There are 6 options: ("ks_pval" or ks_score or "wilcox_pval" or wilcox_score or "revealer" (conditional mutual information from REVEALER) or "custom" (a customized scoring method)). Default is ks_pval.

custom_function

if method is "custom", specifies the name of the customized function here. Default is NULL.

NOTE: custom_function() must take FS_mat (or FS) and input_score as its input arguments, and its final result must return a vector of row-wise scores ordered from most significant to least significant where its labels or names matched the row names of FS_mat (or FS) object.

custom_parameters

if method is "custom", specifies a list of additional arguments (excluding FS_mat (or FS) and input_score) to be passed to the custom_function(). Default is NULL.

alternative

a character string specifies an alternative hypothesis testing ("two.sided" or "greater" or "less"). Default is less for left-skewed significance testing.

NOTE: This argument is applied to KS and Wilcoxon method

weight

if method is ks_score or ks_pval, specifying a vector of weights will perform a weighted-KS testing. Default is NULL.

search_start

a list of character strings (separated by commas) which specifies feature names within the FS object to start the search with. If search_start is provided, then top_N parameter will be ignored. Default is NULL.

top_N

an integer specifies the number of features to start the search over, starting from the top 'N' features in each case. If top_N is provided, then search_start parameter will be ignored. Default is 1.

search_method

a character string specifies an algorithm to filter out the best features ("forward" or "both"). Default is both (i.e. backward and forward).

max_size

an integer specifies a maximum size that a meta-feature can extend to do for a given search. Default is 7.

best_score_only

a logical value indicates whether or not to return the best score corresponding to each top N searches ONLY. Default is FALSE.

do_plot

a logical value indicates whether or not to plot the overlapping features of the resulting meta-feature matrix.

NOTE: plot can only be produced if the resulting meta-feature matrix contains more than 1 feature (e.g. length(search_start) > 1 or top_N > 1). Default is FALSE.

do_check

a logical value indicates whether or not to validate if the given parameters (FS and input_score) are valid inputs. Default is TRUE.

verbose

a logical value indicates whether or not to print the diagnostic messages. Default is FALSE.

Value

If best_score_only is set to TRUE, the function will return a list of objects containing ONLY the best score of the union meta-feature matrix for each top N searches. If best_score_only is set to FALSE, a list of objects containing the returned meta-feature matrix, as well as its corresponding best score and observed input scores are returned.

Details

NOTE: The legacy function topn_eval() is equivalent to the recommended candidate_search() function

Examples


# Load pre-computed feature set
data(sim_FS)

# Load pre-computed input scores
data(sim_Scores)

# Define additional parameters and run the function
candidate_search_result <- candidate_search(
  FS = sim_FS, input_score = sim_Scores, 
  method = "ks_pval", alternative = "less", weight = NULL, 
  search_start = NULL, top_N = 3, search_method = "both",
  max_size = 7, best_score_only = FALSE
)