Collagen family genes and related genes might be associated with prognosis of patients with gastric cancer: an integrated bioinformatics analysis and experimental validation
Original Article

Collagen family genes and related genes might be associated with prognosis of patients with gastric cancer: an integrated bioinformatics analysis and experimental validation

Kongyan Weng1,2, Yinger Huang1,2, Hao Deng1,2, Ruixue Wang3, Shuhong Luo3, Hongfeng Wu1,2, Jialing Chen1,2, Mingjian Long4, Wenbo Hao1,2

1Institute of Antibody Engineering, School of Laboratory Medicine and Biotechnology, Southern Medical University, Guangzhou, China; 2Guangdong Provincial Key Laboratory of Construction and Detection in Tissue Engineering, Southern Medical University, Guangzhou, China; 3Department of Laboratory Medicine, School of Stomatology and Medicine, Foshan University, Foshan, China; 4Department of Laboratory Medicine, The Fifth Affiliated Hospital, Southern Medical University, Guangzhou, China

Contributions: (I) Conception and design: K Weng, W Hao; (II) Administrative support: W Hao; (III) Provision of study materials: K Weng, Y Huang; (IV) Collection and assembly of data: K Weng; (V) Data analysis and interpretation: All authors; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Wenbo Hao. Institute of Antibody Engineering, School of Laboratory Medicine and Biotechnology, Southern Medical University, Guangzhou, China. Email:

Background: Gastric cancer (GC) is disease with a high morbidity. The purpose of this study was to identify genes essential to GC development in patients and to reveal the underlying mechanisms of progression.

Methods: Bioinformatics analysis is an effective tool for discovering essential genes of different disease states. We used the Gene Expression Omnibus (GEO) database to identify differentially expressed genes (DEGs), the DAVID online tool to perform Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis of DEGs, the STRING database to construct the protein-protein interaction (PPI) network of DEGs, the Oncomine and the Cancer Genome Atlas-Stomach Adenocarcinoma (TCGA-STAD) databases to analyze the gene expression differences, the Human pan-Cancer Methylation database (MethHC) to compare the DNA methylation of genes, and the Kaplan-Meier plotter to show the survival analysis of DEGs. We performed Real-Time quantitative PCR (RT-qPCR) experiment to confirm our analysis results.

Results: After the integration of four Gene Expression Series (GSEs), we identified 407 DEGs. GO and KEGG pathway analysis indicated that the upregulated DEGs were significantly enriched in Extracellular Matrix (ECM) related functions and pathways. The main DEGs were collagens (COLs). Moreover, the downregulated DEGs were enriched in ethanol oxidation. Several groups of DEGs, such as insulin-like growth factor binding protein (IGFBP), collagen (COL) and serpin peptidase inhibitors (SERPIN) gene families, constituted several PPI networks. In the Oncomine database, all of the collagen genes were highly expressed in breast cancer, esophageal cancer, GC, head and neck cancer and pancreatic cancer, compared with normal tissues. Consistently, from the TCGA-STAD database, most of the collagens (COLs) were highly expressed and exhibited methylated variation in GC patients. In GC patients, some of these collagen (COL) genes related to worse prognosis, as evidenced by the results from the Kaplan-Meier plotter database analysis. Our RT-qPCR results showed that collagen type III α1 chain (COL3A1) was highly expressed in GC cells. Collagen type V α1 chain (COL5A1) was highly expressed, except in AGS cells, which was consistent with our analysis.

Conclusions: Collagen (COL) family genes might serve as progression and prognosis markers of GC.

Keywords: Gastric cancer (GC); bioinformatics analysis; collagens; prognosis; experimental validation

Submitted Mar 30, 2020. Accepted for publication Sep 08, 2020.

doi: 10.21037/tcr-20-1726


Data from the GLOBOCAN database indicates that, globally, there are more than 1,000,000 new cases of gastric cancer (GC) each year, causing an estimated 783,000 deaths in 2018, making it the fifth most frequently diagnosed cancer and the third leading cause of cancer deaths (1). While new treatment strategies and drug developments have made significant progress, due to the low early detection rate of GC, the survival rate of GC patients remains low (2,3). In addition to the existing primary treatments, targeted therapy is expected to be an essential supplementary treatment for advanced GCs (4). Therefore, it is necessary to explore new molecular targets as well as new, highly sensitive and specific biomarkers to elucidate the molecular mechanisms of GC and improve the prognosis of patients with GC.

Recently, bioinformatics analyses (5) have become increasingly popular for analyzing gene expression changes of the in the progression and development of diseases. For example, the online GEO database ( is a public functional genomics tool that can be utilized to analyze experimental gene expression data uploaded by researchers to identify differentially expressed genes (DEGs) of import to disease. The DAVID online database ( holds information related to proteins and genes, and can be used to mine data for Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses of these genes. Similarly, differences in gene expression between tumor and normal tissues can be obtained from the TCGA database. STRING ( is an online database for use in analyzing PPI networks. These online databases assist in experimental data integration and identification of important genes. In the present study, using GO enrichment analysis, we found several DEGs in GC patients, including collagens (COLs), alcohol dehydrogenases (ADHs), N-acetyl galactosyltransferases (GALNTs). Combining the KEGG, GO, and PPI network analysis results, we selected COLs for more in-depth analysis.

From the HUGO Gene Nomenclature Committee database (, we know that collagen-encoded proteins contain one or more collagen-like domains. Found in vertebrates, this fibrin is a significant component of skin, bones, tendons, cartilage, blood vessels and teeth. Moreover, it is a substantial component of the tumor microenvironment and is involved in cancer fibrosis (6,7). Cancer cells can regulate collagen biosynthesis through mutant genes (8), transcription factors (9,10), signaling pathways and receptors (11,12). Furthermore, collagen can affect tumor cell behavior through tyrosine kinase receptors, integrins, domain receptors, discoidin and some signaling pathways. In GC, collagen type IV α3 chain (COL4A3) has been identified as a potential prognostic factor (13), but few articles have discussed the relationship between collagen genes and GC (14). Therefore, we performed an in-depth study of the COL gene family’s role in GC in order to expose progression mechanisms and to identify prognostic and progression markers.

We present the following article in accordance with the MDAR checklist (available at


Microarray data and Identification of DEGs.

We downloaded four gene expression series (GSE79973, GSE26899, GSE54129 and GSE29272) from the GEO database and screened the DEGs of each series between GC and normal samples by GEO2R ( Genes with more than one probe set or probe sets without corresponding gene symbols were removed or averaged, respectively. Adjusted P value <0.01 and |log2Fold Change| >1 were considered statistically significant. Venn diagram of the differentially upregulated and downregulated genes were created ( The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

KEGG and GO enrichment analyses of DEGs

We used the DAVID ( online database (version 6.8) to analyze the function of identified DEGs and P<0.05 was considered statistically significant.

PPI network construction and module analysis

In the present study, we used STRING ( (version 11.0) to construct the PPI network of the DEGs, where a combined score >0.9 was considered statistically significant. We utilized Cytoscape (version 3.7.2) to analyze the molecular interaction networks and MCODE, a Cytoscape app for finding densely connected regions in a given network, was used to identify the most significant modules in the PPI networks. The criteria for selection were as follows: node score cut-off =0.1, degree cut-off =2, k-score =2 and Max depth =100. The genes in the module were analyzed by GO and KEGG using DAVID.

COLs Gene Expression between normal and tumor samples

We utilized Oncomine ( to investigate the mRNA levels of COLs in normal and tumor tissues. We retrieved twelve members of COL family genes from the Oncomine database. In our study, the P values of comparison were generated from the student’s t-test. The fold change and cut-off P value were defined as 2 and 0.01, respectively. The expression of COL genes in normal and gastric tumor tissues was also studied using the TCGA-STAD database (

COL gene methylation between normal versus tumor tissues

We compared the DNA methylation of COL genes between normal and GC tissues using the Human Pan-cancer Methylation database, MethHC ( The correlation between COL mRNA expression and the methylation in GC patients was analyzed. In our study, the average value was used as a method for evaluating methylation levels and promoter regions selected for analysis.

Prognostic values of COL members in GC patients

The Kaplan-Meier plotter online database ( was used to analyze the relationship between COL expression and the overall survival (OS), first progression (FP), and post-progression survival (PPS) in GC patients. The median COL expression was used as the cut-off. Log-rank P value and hazard ratios, with 95% CI, were calculated.

Cell culture, RNA extraction and real-time quantitative PCR

Human GC cells (AGS, MKN45, HGC27, SGC7901) and human gastric mucosal epithelial cells (GES-1) brought from ATCC were cultured in DMEM supplemented with 10% fetal bovine serum and 1% penicillin-streptomycin. Cells were maintained at 37 °C in a 5% CO2 atmosphere. Total RNA was extracted from cell samples using an Animal Total RNA Isolation Kit (Foregene, China). After quality control, total RNA was reverse transcribed into cDNA using a RevertAid First Strand cDNA Synthesis Kit (Thermo Scientific, Waltham, MA, USA). A SYBR Green ITM premix Ex TaqTM II reagent kit (Takara Biomedical Technology, Guangzhou, China) was employed to amplify and quantify the cDNA templates. All PCR reaction systems and conditions were conducted according to the manufacturer's instructions. Primers for COL3A1 were 5'-GAAAGAGGATCTGAGGGCTCC-3' (forward) and 5'-AAACCGCCAGCTTTTTCACC-3' (reverse) and those for COL5A1 were 5'-CTGACAAGAAGTCCGAAGGGG-3' (forward) and 5'-CGTCCACATAGGAGAGCAGTTT-3' (reverse). Primers for β-actin were 5'-AACTGGGACGACATGGAGAAAA-3' (forward) and 5'-GGATAGCACAGCCTGGATAGCA-3' (reverse). The 2–∆∆Ct method was used to calculate expression levels of target genes.

Statistical analysis

Normally distributed data were expressed as mean ± standard deviation (x ± SD). To examine statistical differences between mRNA expression levels and DNA methylation levels of normal and tumor tissues in GC patients, a two-tailed unpaired Student’s t-test was used, P<0.05 was considered to indicate a statistically significant difference. The RT-qPCR analysis was made by GraphPad Prism 7 software and the t-test was used.


Identification of DEGs and GO enrichment and KEGG analyses

After integrating microarray results according to our standards, we identified several DEGs (3160 in GSE54129, 1581 in GSE79973, 428 in GSE26899 and 445 in GSE29272). The overlap among the four gene expression series contained 407 genes, as shown in the Venn diagram (Figure 1), consisting of 275 downregulated genes (Figure 1A) and 132 upregulated genes (Figure 1B). Among the 407 overlapping genes, we used the DAVID online analysis tool to upload all genes that were upregulated and downregulated, thereby determining statistically rich GO terms and KEGG pathways. GO analysis results showed that upregulated DEGs were involved mainly in extracellular matrix (ECM), organization in biological processes (BP), the ECM in cell component (CC), and ECM structural constituent in molecular function (MF). Moreover, downregulated DEGs were involved mainly with ethanol oxidation in BP and ADH activity in MF (Table 1). The significantly enriched pathways of the DEGs analyzed by the KEGG database are shown in Table 2. Upregulated genes were enriched mainly in the ECM-receptor interaction, focal adhesion, protein digestion and absorption, amoebiasis and PI3K-Akt signaling pathway. Downregulated genes were enriched mainly in chemical carcinogenesis, retinol metabolism, glycolysis/gluconeogenesis, metabolism of xenobiotics by cytochrome P450 and drug metabolism-cytochrome P450.

Figure 1 The distribution of differentially expressed genes between Gene Expression Series, GSE26899, GSE29272, GSE79973 and GSE54129. (A) The distribution of upregulated genes. (B) The distribution of down regulated genes.
Table 1
Table 1 The enriched Gene Ontology terms of up-regulated and down-regulated genes
Full table
Table 2
Table 2 The enriched KEGG pathways of up-regulated and down-regulated genes
Full table

DEG PPI network analyses

PPI networks involving 150 DEGs (consisting of 75 downregulated genes and 75 upregulated genes) were constructed (Figure 2A), excluding the DEGs which could not constitute a part of a network. With the cut-off criterion set as degrees ≥12, there were 26 genes selected as hub genes, including Quiescin sulfhydl oxidase-1 (QSOX1), Fibronectin-1 (FN1), Tissue inhibitor of metalloproteinases-1 (TIMP1), C3 complement, Collagen 18A1 (COL18A1), Mesothelin (MSLN) and Collagen 1A1 (COL1A1). Cytoscape was used to obtain the 5 most significant submodules (Figure 2B,C,D,E,F). In these submodules, we found several members of the insulin-like growth factor binding protein (IGFBP) gene family (first submodule), collagen (COL) gene family members (second submodule), and serpin peptidase inhibitors (SERPIN) gene family members (fourth submodule). These results suggested that IGFBP, COL and SERPIN family members play an essential role in the development of GC. Functional enrichment results of the second submodule, which involved collagen gene family members, revealed that the development of GC was associated with ECM organization in a biological process, similar to the GO analysis, platelet-derived growth factor binding in MF, and collagen trimer in the cellular component. Other submodules are detailed in Table S1.

Figure 2 Module (A) and 5 submodules (B-F) of protein-protein interaction (PPI) network. Line color indicates the type of interaction evidence. The number of lines between two genes indicate the level of interaction between the two genes.
Table S1
Table S1 Features of module and five submodules of protein-protein interaction (PPI) networks
Full table

Up-regulation of COLs in GC patients

In the GO and KEGG enrichment analysis, several members of the collagen gene family frequently were enriched and, in the PPI network analysis, several COL genes were involved in the second significant submodule. Therefore, COL1A1, COL1A2, COL3A1, COL4A1, COL4A2, COL5A1, COL5A2, COL6A2, COL6A3, COL8A1, COL17A1 and COL18A1, which were in the COL gene family and involved in the DEGs, were chosen for more in-depth analysis. To understand better the potential relationship between GC and collagen genes, we used the Oncomine and TCGA-STAD databases to examine the mRNA expression levels of COL genes in normal and gastric tumor tissue. We assessed the expression differences of COLs in 20 cancer samples and their paired normal tissues in the Oncomine database. In these tumor datasets, COL isoforms were significantly upregulated in breast cancer, esophageal cancer, GC, head and neck cancer and pancreatic cancer (Figure 3) compared to matched normal tissues. As the Oncomine and TCGA-STAD databases showed, other COLs were significantly upregulated in tumor tissues (Figures 3,4), except for COL6A2 and COL17A1 (data not shown). The details of COL gene expression in all GC datasets in the Oncomine database are shown in Table S2.

Figure 3 mRNA levels of collagen isoforms in different cancers (Oncomine). The counts of datasets with statistically significant collagens mRNA down-regulation (blue) or up-regulation (red) (normal tissues versus corresponding different cancers) are shown. Threshold setting: gene rank, top 10%; fold change, 2; P value, 0.01. The figures in the colored box represent the numbers of datasets meeting the threshold.
Figure 4 Uaclan database showed that mRNA expression of collagen family genes differed between primary tumor and corresponding normal tissues in gastric cancer patients using (A-J). The blue box represents normal tissue; red box represents tumor tissue. Only P<0.05 was shown.
Table S2
Table S2 The mRNA levels of collagen isoforms in normal and different types of gastric cancer tissues (ONCOMINE)
Full table

DNA methylation of COL genes in GC patients

In order to explore the role of methylation in the regulation of COL expression in GC patients, the MethHC method was used to analyze the methylation level of the COL genes promoter regions, and the relationship between DNA methylation level and mRNA expression level. Among the COL members, the methylation levels between normal and cancer samples were statistically different (P<0.05, Figure 5), except for COL8A1. Notably, DNA methylation of most COLs (10/11) in GC was higher than in the matched normal tissue, except for COL5A2, which was lower than the normal tissue (Figure 5). The relationship between DNA methylation and mRNA expression of COL members in GC are listed in Table S3, although the R values did not prove the relationship between mRNA level and DNA methylation.

Figure 5 The methylation of collagen isoforms in gastric cancer and normal tissues (MethHC). Box plots in red color represent cancer samples and those in green color represent normal samples. **, indicates P<0.005. GC, gastric cancer.
Table S3
Table S3 The relationship between DNA methylation and mRNA expression in the collagen gene members of gastric cancer patients (MethHC)
Full table

Prognostic characteristics of COLs in GC patients

Prognostic characteristics of GC patients, including OS, first progression (FP), and post progression survival (PPS), were surveyed in the Kaplan-Meier plotter database. Among these COLs available in the Kaplan-Meier database, most genes showed a positive relationship between high expression and significantly worse OS in GC patients (Figure 6A), except COL3A1, COL5A2 and COL17A1. The data showed FP reduction with low COL17A1 (Figure 6B) and high levels of the other collagen genes. The significant, inverse relationship was shown between PPS and collagen genes, except for COL5A2 and COL17A1 (Figure 6C). High COL1A1, COL1A2, COL4A1, COL4A2, COL5A1, COL6A2, COL6A3, COL8A1 and COL18A1 mRNA expression levels led to reduced OS, FP and PPS in GC patients. Furthermore, increased COL17A1 mRNA levels significantly correlated only with increased FP, but was not correlated with OS or PPS. In Lauren’s classification, GC is divided into three categories: diffuse, intestinal and mixed. Therefore, the Kaplan-Meier plotter online tool can be used to determine the prognostic value of COL gene isoforms in different GC subtypes. The data showed that high expression levels of COL1A1, COL1A2, COL3A1, COL4A1, COL4A2, COL5A1, COL6A2, COL6A3, and COL18A1 led to reduced OS, FP and PPS in intestinal and diffuse-type GC patients. Additionally, in the mixed-type GC patients, most of the COLs were with no-significance because the number of the cases were too small for statistical analysis (Table S4). The different transcript levels of COL17A1 had no effect on the three subtypes, except the OS in intestinal type, which corresponded to the result where the COL17A1 mRNA expression level showed no difference between the normal and tumor tissues. The complex relationship of these GC subtype survival time (OS, FP, PPS) with the COLmRNA expression was shown in the supplementary materials (Figures S1-S3).

Figure 6 Different mRNA levels of collagen genes prognostic values in gastric cancer patients (Kaplan-Meier plotter). Kaplan-Meier plots show the relationship between OS (A), FP (B) and PPS (C) and the expression of collagens in gastric cancer patients, with hazard ratio (HR) and statistical significance.
Table S4
Table S4 The prognostic values of collagen isoforms in different subtypes of gastric cancer patients (Kaplan-Meier plotter)
Full table
Figure S1 Different mRNA level of collagens’ prognostic values in diffuse subtype gastric cancer patients (Kaplan-Meier plotter). Notes: Kaplan-Meier plots show the relationship between OS (A), FP (B) and PPS (C) and the expression of collagens in gastric cancer patients, respectively, with hazard ratio (HR) and statistical significance.
Figure S2 Different mRNA level of collagens’ prognostic values in intestinal subtype gastric cancer patients (Kaplan-Meier plotter). Notes: Kaplan-Meier plots show the relationship between OS (A), FP (B) and PPS (C) and the expression of collagens in gastric cancer patients, respectively, with hazard ratio (HR) and statistical significance.
Figure S3 Different mRNA level of collagens’ prognostic values in mixed subtype gastric cancer patients (Kaplan-Meier plotter). Notes: Kaplan-Meier plots show the relationship between OS (A), FP (B) and PPS (C) and the expression of collagens in gastric cancer patients, respectively, with hazard ratio (HR) and statistical significance.

mRNA expression of COL3A1 and COL5A1 in different GC cells

Except for COL6A2 and COL17A1, COLs were highly expressed according to the TCGA-STAD databases. We chose the COL3A1 and COL5A1 genes, which lacked experimental verification in GC but have already been shown to play roles in other cancers (15,16), for RT-qPCR experiments to validate our analysis results. As shown in Figure 7, the COL3A1 level was 1.310×106 folds higher in HGC27, 185 folds higher in SGC7901, 96 folds higher in MKN45 and six folds higher in AGS human GC cell lines, compared with GES-1 normal human gastric mucosal epithelial cell line, and these results were consistent with the analyses from the TCGA-STAD databases. Interestingly, COL5A1 was highly expressed in HGC27, MKN45 and SGC7901, at about 3-7 folds, which were also consistent with the above results. However, in the AGS cell lines, COL5A1 was 400 folds lower than in GES-1. These results require additional in-depth exploration.

Figure 7 The expression of COL3A1 and COL5A1 mRNA in different gastric cancer cells. *, indicates folds change from 2 to 10; ***, indicates folds change from 100 to 500; ****, indicates folds higher than 1,000.


In recent years, significant efforts have been made in order to understand better the early diagnosis, targeted therapy and prognosis of GC (17,18). However, the OS of patients with GC remains unimproved, particularly in developing countries (19,20). Our study aimed to identify nuclear genes with similar functions that are highly expressed in GC, compared to normal controls, and to reveal their underlying mechanisms. In the present study, we downloaded the gene expression series of GSE79973, GSE26899, GSE54129 and GSE29272 from the GEO database and found 132 upregulated and 275 downregulated overlap DEGs between GC and normal controls. GO term analysis showed that upregulated DEGs were related primarily with ECM. As reported in previous studies, several ECM-related genes had impacts on the development of GC (21-23). Increased deposition of matrix proteins favors tumor progression by interfering with cell polarity, cell-cell adhesion and, ultimately, amplifying growth factor signaling. As the most significant ECM component (24), collagen determines the functional properties of the matrix and changes in the deposition or degradation of collagen can lead to a decline of ECM homeostasis. It has been reported that increased collagen cross-linking and deposition leads to tumor progression via increased integrin signaling (25). PPI network analysis showed that the IGFBP, SERPIN, and COL gene families were enriched in several submodules. Previous studies have shown that IGFBPs play a protective role in the process of GC development (26-28). However, in our meta-analysis, we found that IGFBP3, IGFBP4, and IGFBP7 were upregulated in GC patients, which was opposed to normal tissues. This might be a self-protection mechanism in GC patients, and additional experiments and analyses are required to investigate this unusual situation. Wang et al. (29), Ju et al. (30), and Yang et al. (31) found that SERPINs can be used as a novel prognostic factor in GC. Additionally, Tian et al. found that SERPINH1 was overexpressed in GC patients and took part in the regulation of EMT (32), which supported the results of our analysis.

Among the identified DEGs, 12 collagen genes were found. Most of these collagen genes with high mRNA and DNA methylation levels ExceptingCOL6A2, COL8A1, COL17A1 and COL5A2, these collagen genes were found to have high mRNA and DNA methylation levels. DNA methylation causes gene silencing. Our results showed, however, high DNA methylation in the promoter region (except for COL5A2), similar to the mRNA levels, in the GC cells. The results showed that methylation in the promotor region did not influence mRNA expression levels COL genes and suggested that methylation may exist in another region or some other mechanism may have affected mRNA levels. Kaplan-Meier analysis revealed that most of the COLs showed a positive relationship between high expression and significantly worse prognoses in GC patients, which supported the idea that COLs could be prognostic markers in GC patients. Previously, only a few isoforms of COLs involved in GC were reported. Previous studies have demonstrated that upregulated expression of COL1A1 (33), COL1A2 and COL6A3 (34) enhanced the invasive properties of GC cells. COL4A3 was confirmed as a prognostic factor in GC (13). The role of other COLs in GC has not been published (14). Accordingly, additional experimental verification is required to confirm our results and evaluate their meaning. It has been shown that COL3A1 and COL5A1 can be a diagnostic marker in breast cancer and plays a role in non-small cell lung cancer (15,16,35). Therefore, we chose these two COLs for RT-qPCR experiments. After our repeated experiments, data showed that COL3A1 was highly expressed in the four cell lines, and that COL5A1 was highly expressed, in except AGS cells. The differing expression levels between GC cell lines suggested to determine the differences between the cell lines. We found that between these four GC cell lines, HGC27 had the highest degree of malignancy, while AGS was the lowest (36-39).These results are consistent with the expression levels of COL3A1 and COL5A1 in each of the cell lines, which provided a basis for COL3A1 and COL5A1 as markers for the progression and prognosis of GC.


Additional experimentation is required in order to determine whether the COL gene family can be utilized as markers of GC progression and prognosis. Our analysis provides a feasible basis for the idea that COLs may be used as progression and prognosis markers of GC.


We thank the contributions of the staff of the Institute of Antibody Engineering, School of Laboratory Medicine and Biotechnology, Southern Medical University in Guangzhou, China.

Funding: This study was supported by the National Natural Science Foundation of China (Grant no. 31672536 and no. 81773271), Guangdong Provincial Education Department of Education of Guangdong Province (2017KZDXM088 and 2018KQNCX284). The Joint Fund of Basic and Applied Basic Research Fund of Guangdong Province (2019A1515110689) and Guangdong Provincial Department of Science and Technology (2019B110233003). The funders had no role in study design, data collection, analysis, decision to publish or preparation of the manuscript.


Reporting Checklist: The authors have completed the MDAR checklist. Available at

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See:


  1. Bray F, Ferlay J, Soerjomataram I, et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2018;68:394-424. [Crossref] [PubMed]
  2. Zhang H, Wang X, Huang H, et al. Hsa_circ_0067997 promotes the progression of gastric cancer by inhibition of miR-515-5p and activation of X chromosome-linked inhibitor of apoptosis (XIAP). Artif Cells Nanomed Biotechnol 2019;47:308-18. [Crossref] [PubMed]
  3. Wang YN, Xu F, Zhang P, et al. MicroRNA-575 regulates development of gastric cancer by targeting PTEN. Biomed Pharmacother 2019;113:108716. [Crossref] [PubMed]
  4. Shah MA, Xu RH, Bang YJ, et al. HELOISE: Phase IIIb Randomized Multicenter Study Comparing Standard-of-Care and Higher-Dose Trastuzumab Regimens Combined With Chemotherapy as First-Line Therapy in Patients With Human Epidermal Growth Factor Receptor 2-Positive Metastatic Gastric or Gastroesophageal Junction Adenocarcinoma. J Clin Oncol 2017;35:2558-67. [Crossref] [PubMed]
  5. Sedlazeck FJ, Lee H, Darby CA, et al. Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat Rev Genet 2018;19:329-46. [Crossref] [PubMed]
  6. Discher DE, Smith L, Cho S, et al. Matrix Mechanosensing: From Scaling Concepts in 'Omics Data to Mechanisms in the Nucleus, Regeneration, and Cancer. Annu Rev Biophys 2017;46:295-315. [Crossref] [PubMed]
  7. Yamauchi M, Barker TH, Gibbons DL, et al. The fibrotic tumor stroma. J Clin Invest 2018;128:16-25. [Crossref] [PubMed]
  8. Wörmann SM, Song L, Ai J, et al. Loss of P53 Function Activates JAK2-STAT3 Signaling to Promote Pancreatic Tumor Growth, Stroma Modification, and Gemcitabine Resistance in Mice and Is Associated With Patient Survival. Gastroenterology 2016;151:180-93.e12. [Crossref] [PubMed]
  9. Yoshida T, Hashimura M, Kuwata T, et al. Transcriptional regulation of the alpha-1 type II collagen gene by nuclear factor B/p65 and Sox9 in the chondrocytic phenotype of uterine carcinosarcomas. Hum Pathol 2013;44:1780-8. [Crossref] [PubMed]
  10. Nagathihalli NS, Castellanos JA, Shi C, et al. Signal Transducer and Activator of Transcription 3, Mediated Remodeling of the Tumor Microenvironment Results in Enhanced Tumor Drug Delivery in a Mouse Model of Pancreatic Cancer. Gastroenterology 2015;149:1932-43.e9. [Crossref] [PubMed]
  11. Laklai H, Miroshnikova YA, Pickup MW, et al. Genotype tunes pancreatic ductal adenocarcinoma tissue tension to induce matricellular fibrosis and tumor progression. Nat Med 2016;22:497-505. [Crossref] [PubMed]
  12. Miskolczi Z, Smith MP, Rowling EJ, et al. Collagen abundance controls melanoma phenotypes through lineage-specific microenvironment sensing. Oncogene 2018;37:3166-82. [Crossref] [PubMed]
  13. Nie XC, Wang JP, Zhu W, et al. COL4A3 expression correlates with pathogenesis, pathologic behaviors, and prognosis of gastric carcinomas. Hum Pathol 2013;44:77-86. [Crossref] [PubMed]
  14. Xu S, Xu H, Wang W, et al. The role of collagen in cancer: from bench to bedside. J Transl Med 2019;17:309. [Crossref] [PubMed]
  15. Wang Y, Resnick MB, Lu S, et al. Collagen type III alpha1 as a useful diagnostic immunohistochemical marker for fibroepithelial lesions of the breast. Hum Pathol 2016;57:176-81. [Crossref] [PubMed]
  16. Souza P, Rizzardi F, Noleto G, et al. Refractory remodeling of the microenvironment by abnormal type V collagen, apoptosis, and immune response in non-small cell lung cancer. Hum Pathol 2010;41:239-48. [Crossref] [PubMed]
  17. Badiyan SN, Hallemeier CL, Lin SH, et al. Proton beam therapy for gastrointestinal cancers: past, present, and future. J Gastrointest Oncol 2018;9:962-71. [Crossref] [PubMed]
  18. Tan AC, Chan DL, Faisal W, et al. New drug developments in metastatic gastric cancer. Therap Adv Gastroenterol 2018;11:1756284818808072. [Crossref] [PubMed]
  19. Goetze OT, Al-Batran SE, Chevallay M, et al. Multimodal treatment in locally advanced gastric cancer. Updates Surg 2018;70:173-9. [Crossref] [PubMed]
  20. Chen W, Zheng R, Baade PD, et al. Cancer statistics in China, 2015. CA Cancer J Clin 2016;66:115-32. [Crossref] [PubMed]
  21. Umeda S, Kanda M, Miwa T, et al. Fraser extracellular matrix complex subunit 1 promotes liver metastasis of gastric cancer. Int J Cancer 2020;146:2865-76. [Crossref] [PubMed]
  22. Wang H, Chen H, Jiang Z, et al. Integrin subunit alpha V promotes growth, migration, and invasion of gastric cancer cells. Pathol Res Pract 2019;215:152531. [Crossref] [PubMed]
  23. Wu H, Qiao F, Zhao Y, et al. Downregulation of Long Non-coding RNA FALEC Inhibits Gastric Cancer Cell Migration and Invasion Through Impairing ECM1 Expression by Exerting Its Enhancer-Like Function. Front Genet 2019;10:255. [Crossref] [PubMed]
  24. Yan Q, Sui W, Xie S, et al. Expression and role of integrin-linked kinase and collagen IV in human renal allografts with interstitial fibrosis and tubular atrophy. Transpl Immunol 2010;23:1-5. [Crossref] [PubMed]
  25. Walker C, Mojares E, Del RHA. Role of Extracellular Matrix in Development and Cancer Progression. Int J Mol Sci 2018;19:3028. [Crossref] [PubMed]
  26. Luo C, Sun F, Zhu H, et al. Insulin-like growth factor binding protein-1 (IGFBP-1) upregulated by Helicobacter pylori and is associated with gastric cancer cells migration. Pathol Res Pract 2017;213:1029-36. [Crossref] [PubMed]
  27. Kim J, Kim WH, Byeon SJ, et al. Epigenetic Downregulation and Growth Inhibition of IGFBP7 in Gastric Cancer. Asian Pac J Cancer Prev 2018;19:667-75. [PubMed]
  28. Kim ST, Jang HL, Lee J, et al. Clinical Significance of IGFBP-3 Methylation in Patients with Early Stage Gastric Cancer. Transl Oncol 2015;8:288-94. [Crossref] [PubMed]
  29. Wang K, Wang B, Xing AY, et al. Prognostic significance of SERPINE2 in gastric cancer and its biological function in SGC7901 cells. J Cancer Res Clin Oncol 2015;141:805-12. [Crossref] [PubMed]
  30. Ju H, Lim B, Kim M, et al. SERPINE1 intron polymorphisms affecting gene expression are associated with diffuse-type gastric cancer susceptibility. Cancer 2010;116:4248-55. [Crossref] [PubMed]
  31. Yang J, Xiong X, Wang X, et al. Identification of peptide regions of SERPINA1 and ENOSF1 and their protein expression as potential serum biomarkers for gastric cancer. Tumour Biol 2015;36:5109-18. [Crossref] [PubMed]
  32. Tian S, Peng P, Li J, et al. SERPINH1 regulates EMT and gastric cancer metastasis via the Wnt/beta-catenin signaling pathway. Aging (Albany NY) 2020;12:3574-93. [Crossref] [PubMed]
  33. Shi Y, Duan Z, Zhang X, et al. Down-regulation of the let-7i facilitates gastric cancer invasion and metastasis by targeting COL1A1. Protein Cell 2019;10:143-8. [Crossref] [PubMed]
  34. Ao R, Guan L, Wang Y, et al. Silencing of COL1A2, COL6A3, and THBS2 inhibits gastric cancer cell proliferation, migration, and invasion while promoting apoptosis through the PI3k-Akt signaling pathway. J Cell Biochem 2018;119:4420-34. [Crossref] [PubMed]
  35. Pan J, Mor G, Ju W, et al. Viral Infection-Induced Differential Expression of LncRNAs Associated with Collagen in Mouse Placentas and Amniotic Sacs. Am J Reprod Immunol 2015;74:237-57. [Crossref] [PubMed]
  36. Barranco SC, Townsend CM Jr, Casartelli C, et al. Establishment and characterization of an in vitro model system for human adenocarcinoma of the stomach. Cancer Res 1983;43:1703-9. [PubMed]
  37. Lin C, Fu Z, Liu Y, et al. The establishment of human gastric carcinoma cell line (SGC7901). China Academic Journal Electronic Publishing House 1981;1:1-03.
  38. Akagi T, Kimoto T. Human cell line (HGC-27) derived from the metastatic lymph node of gastric cancer. Acta Med Okayama 1976;30:215-9. [PubMed]
  39. Naito Y, Kino I, Horiuchi K, et al. Promotion of collagen production by human fibroblasts with gastric cancer cells in vitro. Virchows Arch B Cell Pathol Incl Mol Pathol 1984;46:145-54. [Crossref] [PubMed]
Cite this article as: Weng K, Huang Y, Deng H, Wang R, Luo S, Wu H, Chen J, Long M, Hao W. Collagen family genes and related genes might be associated with prognosis of patients with gastric cancer: an integrated bioinformatics analysis and experimental validation. Transl Cancer Res 2020;9(10):6246-6262. doi: 10.21037/tcr-20-1726