Title: | Cumulative Percent Decay Curve Generator |
---|---|
Description: | Calculates and visualises cumulative percent 'decay' curves, which are typically calculated from metagenomic taxonomic profiles. These can be used to estimate the level of expected 'endogenous' taxa at different abundance levels retrieved from metagenomic samples, when comparing to samples of known sampling site or source. Method described in Fellows Yates, J. A. et. al. (2021) Proceedings of the National Academy of Sciences USA <doi:10.1073/pnas.2021655118>. |
Authors: | James A. Fellows Yates [aut, cre]
|
Maintainer: | James A. Fellows Yates <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.0.9000 |
Built: | 2025-03-12 04:17:13 UTC |
Source: | https://github.com/jfy133/cuperdec |
Automates a selection of a per-sample 'burn in' based on the nature of the sample's curve itself (rather than supplying a hard value) by finding the point from which the 'fluctuation' of the curve doesn't exceed the mean +- SD of the total curve.
adaptive_burnin_filter(curves, percent_threshold)
adaptive_burnin_filter(curves, percent_threshold)
curves |
A cuperdec curve table calculated with
|
percent_threshold |
A percentage of the target-source in a sample above which a sample is considered 'retained'. |
A tibble with each row showing each sample and whether it passed the specified filter.
data(cuperdec_taxatable_ex) data(cuperdec_database_ex) taxa_table <- load_taxa_table(cuperdec_taxatable_ex) iso_database <- load_database(cuperdec_database_ex, target = "oral") curve_results <- calculate_curve(taxa_table, iso_database) adaptive_burnin_filter(curve_results, percent_threshold = 0.1)
data(cuperdec_taxatable_ex) data(cuperdec_database_ex) taxa_table <- load_taxa_table(cuperdec_taxatable_ex) iso_database <- load_database(cuperdec_database_ex, target = "oral") curve_results <- calculate_curve(taxa_table, iso_database) adaptive_burnin_filter(curve_results, percent_threshold = 0.1)
Performs the initial decay curve based on percentage of 'target' isolation source along a rank of most to least abundant taxa for a given sample.
calculate_curve(taxa_table, database)
calculate_curve(taxa_table, database)
taxa_table |
An OTU table loaded with |
database |
A database file loaded with |
An object in the form of a tibble with taxa of each given sample ordered by rank and the proportion of taxa up to that rank deriving from your target source.
data(cuperdec_taxatable_ex) data(cuperdec_database_ex) taxa_table <- load_taxa_table(cuperdec_taxatable_ex) iso_database <- load_database(cuperdec_database_ex, target = "oral") calculate_curve(taxa_table, iso_database)
data(cuperdec_taxatable_ex) data(cuperdec_database_ex) taxa_table <- load_taxa_table(cuperdec_taxatable_ex) iso_database <- load_database(cuperdec_database_ex, target = "oral") calculate_curve(taxa_table, iso_database)
Example isolation source database used for input to cuperdec based. Species names are from a NCBI Nt database and isolation sources gather from the Human Oral Microbiome database, NCBI GenBank, and manual curation.
data(cuperdec_database_ex)
data(cuperdec_database_ex)
An TSV table loaded as a tibble
.
data(cuperdec_database_ex) load_database(cuperdec_database_ex, target = "oral")
data(cuperdec_database_ex) load_database(cuperdec_database_ex, target = "oral")
Example metadata map file corresponding to samples in example data "cuperdec_taxatable_ex". Includes a grouping column corresponding to sample species.
data(cuperdec_metadata_ex)
data(cuperdec_metadata_ex)
An TSV table loaded as a tibble
.
data(cuperdec_metadata_ex) load_map(cuperdec_metadata_ex, sample_col = "#SampleID", source_col = "Env")
data(cuperdec_metadata_ex) load_map(cuperdec_metadata_ex, sample_col = "#SampleID", source_col = "Env")
Example taxon table used for input to cuperdec based on data including shotgun-sequenced ancient calculus samples aligned against the NCBI Nt database from Oct 2017 using MALT. Samples are columns, rows are taxa and counts are assigned reads.
data(cuperdec_taxatable_ex)
data(cuperdec_taxatable_ex)
An TSV table loaded as a tibble
.
data(cuperdec_taxatable_ex) load_taxa_table(cuperdec_taxatable_ex)
data(cuperdec_taxatable_ex) load_taxa_table(cuperdec_taxatable_ex)
Returns a table of whether each sample passes a given threshold, after considering a 'burn-in', in the form of a fraction of the abundance ranks.
hard_burnin_filter(curves, percent_threshold, rank_burnin)
hard_burnin_filter(curves, percent_threshold, rank_burnin)
curves |
A cuperdec curve table calculated with
|
percent_threshold |
A percentage of the target-source in a sample above which a sample is considered 'retained'. |
rank_burnin |
A number between 0 and 1 indicating the fraction of taxa to ignore before applying the threshold. |
A tibble with each row showing each sample and whether it passed the specified filter.
data(cuperdec_taxatable_ex) data(cuperdec_database_ex) taxa_table <- load_taxa_table(cuperdec_taxatable_ex) iso_database <- load_database(cuperdec_database_ex, target = "oral") curve_results <- calculate_curve(taxa_table, iso_database) hard_burnin_filter(curve_results, percent_threshold = 50, rank_burnin = 0.1)
data(cuperdec_taxatable_ex) data(cuperdec_database_ex) taxa_table <- load_taxa_table(cuperdec_taxatable_ex) iso_database <- load_database(cuperdec_database_ex, target = "oral") curve_results <- calculate_curve(taxa_table, iso_database) hard_burnin_filter(curve_results, percent_threshold = 50, rank_burnin = 0.1)
Loads a taxon/isolation source database file, i.e. first column is a list of taxa, and the second column is a list of isolation sources, and formats for downstream analysis.
load_database(x, target)
load_database(x, target)
x |
Path to a (minimum) two column TSV file or tidy dataframe (e.g. tibble), one column with taxon names and other indicating if from target isolation source. |
target |
the string in the 'Isolation Source' (i.e. 2nd) column which is the expected target source of the samples |
Taxon names should match that with the taxa table.
A tibble, formatted for use in downstream cuperdec functions.
data(cuperdec_database_ex) iso_database <- load_database(cuperdec_database_ex, target = "oral")
data(cuperdec_database_ex) iso_database <- load_database(cuperdec_database_ex, target = "oral")
Loads a metadata table and reformats it for downstream analysis. This needs to include at minimum two columns: sample name, and sample source.
load_map(x, sample_col, source_col)
load_map(x, sample_col, source_col)
x |
Path to a TSV file or tidy dataframe (e.g. tibble) with a column containing sample names and other grouping metadata columns. |
sample_col |
A column name specifying which column should be used to specify sample names. |
source_col |
A column name specifying which group or the source the sample is from. |
The two columns required need to include the following information:
Sample name - a unique identifier for each sample
Sample source - a grouping ID indicating what 'source' the sample is from This is used for plotting to separate comparative 'sources' to your own samples.
A tibble, formatted for use in downstream cuperdec functions.
data(cuperdec_metadata_ex) metadata_table <- load_map(cuperdec_metadata_ex, sample_col = "#SampleID", source_col = "Env" )
data(cuperdec_metadata_ex) metadata_table <- load_map(cuperdec_metadata_ex, sample_col = "#SampleID", source_col = "Env" )
Loads a typical taxa table (Samples: columns; Taxa: rows) in TSV format and standardises some columns, storing the table in the form of a tibble.
load_taxa_table(x)
load_taxa_table(x)
x |
Path to a TSV file or tidy dataframe (e.g. tibble) consisting of an OTU table of samples as columns, except first column with taxon names. |
A tibble, formatted for use in downstream cuperdec functions.
data(cuperdec_taxatable_ex) taxa_table <- load_taxa_table(cuperdec_taxatable_ex)
data(cuperdec_taxatable_ex) taxa_table <- load_taxa_table(cuperdec_taxatable_ex)
Generates visual representation of curves, with optional separate plotting of different groups, and also indication of individuals passing different on types filters.
plot_cuperdec( curves, metadata, burnin_result, restrict_x = 0, facet_cols = NULL )
plot_cuperdec( curves, metadata, burnin_result, restrict_x = 0, facet_cols = NULL )
curves |
Output tibble from |
metadata |
Output from |
burnin_result |
Output from |
restrict_x |
Restrict viewing of abundance rank to X number of ranks (useful for closer inspection of curves) (optional). |
facet_cols |
Custom number of columns for faceted plots (optional). |
A ggplot2 image object.
data(cuperdec_taxatable_ex) data(cuperdec_database_ex) data(cuperdec_metadata_ex) taxa_table <- load_taxa_table(cuperdec_taxatable_ex) iso_database <- load_database(cuperdec_database_ex, target = "oral") metadata_table <- load_map(cuperdec_metadata_ex, sample_col = "#SampleID", source_col = "Env" ) curves <- calculate_curve(taxa_table, iso_database) burnin_results <- adaptive_burnin_filter(curves, percent_threshold = 0.1) plot_cuperdec(curves, metadata_table, burnin_results)
data(cuperdec_taxatable_ex) data(cuperdec_database_ex) data(cuperdec_metadata_ex) taxa_table <- load_taxa_table(cuperdec_taxatable_ex) iso_database <- load_database(cuperdec_database_ex, target = "oral") metadata_table <- load_map(cuperdec_metadata_ex, sample_col = "#SampleID", source_col = "Env" ) curves <- calculate_curve(taxa_table, iso_database) burnin_results <- adaptive_burnin_filter(curves, percent_threshold = 0.1) plot_cuperdec(curves, metadata_table, burnin_results)
Performs the initial decay curve based on percentage of 'target' isolation source along a rank of most to least abundant taxa for a given sample.
simple_filter(curves, percent_threshold)
simple_filter(curves, percent_threshold)
curves |
A cuperdec curve table calculated with
|
percent_threshold |
A database file loaded with
|
A tibble with each row showing each sample and whether it passed the specified filter.
data(cuperdec_taxatable_ex) data(cuperdec_database_ex) taxa_table <- load_taxa_table(cuperdec_taxatable_ex) iso_database <- load_database(cuperdec_database_ex, target = "oral") curve_results <- calculate_curve(taxa_table, iso_database) simple_filter(curve_results, percent_threshold = 50)
data(cuperdec_taxatable_ex) data(cuperdec_database_ex) taxa_table <- load_taxa_table(cuperdec_taxatable_ex) iso_database <- load_database(cuperdec_database_ex, target = "oral") curve_results <- calculate_curve(taxa_table, iso_database) simple_filter(curve_results, percent_threshold = 50)