Clinical data mining: What are the challenges?
- 10th January 2023
- Posted by: Breige McBride
- Category: Bioinformatics
Considering using clinical data mining to advance your drug discovery and development research? This blog details the major issues in data mining to be aware of, and how to overcome them.
What is Clinical Data Mining?
Clinical data mining refers to searching clinical research data (using data mining techniques) for insights to advance research in fields such as medicine and drug development. For instance, an example of clinical data mining could be searching cancer clinical trial data sets to find an association between a certain gene mutation and a specific type of cancer.
Challenges in clinical data mining
Knowing where to look
One of the major issues in data mining, clinical or otherwise, is knowing what database to use and where to look for data. Clinical data mining focuses on biological data sets and with the vast amount of biological data available in the public domain, this presents a challenge. Finding relevant data sets to help answer your research question can be like searching for a needle in a haystack.
To help you with this challenge, we share some public datasets below, which we often mine for our clients:
- The Cancer Genome Atlas (TCGA)
- Cancer Dependency Map (DepMap)
- Cancer Cell Line Encyclopaedia (CCLE)
- Gene Expression Omnibus (GEO)
- European Nucleotide Archive (ENA)
- Expression Atlas
- Database of Immune Cell EQTLs, Expression, Epigenomics (DICE)
- cBioPortal for Cancer Genomics
Lengthy time requirements
Of course, even once there is an idea of where to look, conducting a search may still be significantly time-consuming. This is another of the major issues in data mining. Due to the vast amounts of biological data available, clinical data mining takes time. Searching the plethora of journal abstracts and data sets available for relevant search terms or meta-data can be a slow process.
To overcome this challenge, it is important to carefully curate a list of search terms relevant to your research question to return only relevant results. For example, you should consider criteria like the data type, disease, species and genes of interest that you would like publications in your search results to focus on.
Handling data
Storing and accessing the data sets which your search returns presents it own challenges. Biological datasets are usually large to begin with. However, your search may return many data sets, which combined can easily contain data generated from tens of thousands of biological samples. The computational resources required to store and sift through so much data are not insignificant. If you do not have adequate computational resources available, outsourcing your data mining project may be the most economical option.
Another challenge in clinical data mining is the issue of assessing data quality. We recommend paying particular attention to the metadata of a data set to get a clearer picture of exactly what the data set contains and to ensure it matches up with the associated research publication you may have searched.
A final issue to consider is the challenge of combing data sets together. For example, combining in-house data with public domain data can help increase overall statistical power. However, amalgamating data from different repositories can be challenging as different databases collect their information in different ways. This can mean that similar information in similar data sets is found in entirely different places.
Data mining as a service
An easy way to solve all of your clinical data mining challenges is to have an expert provider that could support your data mining project. Companies that specialise in data mining as a service have the tools and knowledge necessary to overcome all of the challenges above. At Fios Genomics, data mining is one of our most popular services so we have a wealth of experience in completing data mining projects. We can quickly curate data sets relevant to your research from a wide range of publicly available sources and databases. We also have experience with meta-analyses combining data and outcomes from several studies to increase the overall statistical power. If you have a clinical data mining project you would like to discuss, then contact us today! Alternatively, you can visit our data mining service page to learn more about our data mining and landscaping capabilities and experience.
You may also be interested in:
Biological data mining and landscaping whitepaper
Bioinformatics trends 2022: Fios Genomics year in review
Leave a Reply
You must be logged in to post a comment.