Extended Abstract
Background:
Investigating
whether a single nucleotide polymorphism (SNP) is functionally involved in a
disease is important for disease gene mapping. For complex diseases, the
problem is complicated because, unlike Mendelian
diseases, their genetic causes might involve hundreds of genes and alleles.
Although there are millions of SNPs deposited in
public SNP databases, only a small proportion of them are functional
polymorphisms that contribute to disease phenotypes. Thus, prioritizing SNPs based on their phenotypic risks is essential for
association studies. Assessment of the risk requires up-to-date data about the
candidate SNP, which in turns requires access to a variety of heterogeneous
biological databases and analytical tools.
Methods:
FASTSNP (Function
Analysis and Selection Tool for Single Nucleotide Polymorphisms) is a web
server that allows users to efficiently identify the SNPs
most likely to have functional effects. It prioritizes SNPs
according to twelve phenotypic risks and putative functional effects, such as
changes to the transcriptional level, pre-mRNA splicing, protein structure,
etc. A unique feature of FASTSNP is that the prediction of functional effects
is always based on the most up-to-date information, which FASTSNP extracts from
eleven external Web servers at query time using a team of re-configurable Web
wrapper agents. These Web wrapper agents automate Web browsing and data
extraction and can be easily configured and maintained with a tool that uses a machine
learning algorithm. This allows users to configure/repair a Web wrapper agent
without programming. Another benefit of using Web wrapper agents is that
FASTSNP is extendable, so we can include new functions by simply deploying more
Web wrapper agents. In this manner, we have already built several new
functionalities, such as the inclusion of information on haplotype
blocks from HapMap, checking the sequence
quality of submitted SNP by mapping on UCSC Golden Path sequence and integrating
both NCBI and Ensembl annotation. In addition to SNP
prioritization, FASTSNP provides project management services for registered
users to store and export their candidate SNPs and
update the SNPs’ putative functional effects by
re-submitting the query.
Connected Web Servers:
|
Name/URL |
Usage |
|
NCBI dbSNP |
Provides the location of a SNP in a gene and its
alleles, allele frequency, and context sequence. |
|
Ensembl |
Provides a cross-reference/alternative data
source for dbSNP. |
|
TFSearch |
Predicts if a non-coding SNP alters the
transcription factor binding site of a gene. |
|
PolyPhen |
Predicts if a non-synonymous SNP alters an amino
acid in a protein resulting in structural changes (damaged or benign) in a
protein. |
|
ESEfinder |
Predicts if a synonymous SNP is located in a exonic splicing enhancer
motif, which would diminish the motif with a different allele. |
|
Rescue-ESE |
Provides a cross-reference/alternative data
source for ESEfinder. |
|
NCBI GeneBank |
Provides all spliced form mRNAs and their
translated proteins of the gene sequence. |
|
SwissProt |
Provides the information about protein domains to
determine if a SNP causes an alternative splicing that leads to a protein
domain being abolished. |
|
UCSC Golden Path |
Provides information about the final draft
assembly of the genome sequence (i.e., Golden Path) for quality control of
candidate SNPs. |
|
NCBI Blast |
Sequence comparison and search tool for quality
control of candidate SNPs |
|
HapMap |
Provides information about the haplotype and linkage disequilibrium around a SNP. |
|
FAS-ESS |
Predicts whether a coding SNP will abolish exonic splicing silencer motifs |
Results:
FASTSNP allows users to
select functional polymorphisms for association studies in a convenient way. Currently,
our collaborating institute, the National Genotyping
Center (NGC), Academia Sinica,
Availability: FASTSNP is freely available at
http://fastsnp.ibms.sinica.edu.tw/.
Registration is required for project management services.
Acknowledgements:
This project was
supported in part by the National
Research Program in Genomic Medicine (NRPGM), National Science Council,
Copyright © 2005
Institute of Biomedical Sciences and