We provide all datasets (Mitochondrial AARSs, Cytosolic AARSs, Independent dataset and unknown AARSs), user can download it. The eukaryotic AARSs data of total 390 experimentally validated protein sequences were obtained from the UniProt database. This dataset contains 117 mitochondrial and 162 cytosolic tRNA synthetases. We have found that 88 AARSs has no clue for their sub-cellular localization. Remaining 23 AARSs are putative/fragments or chloplastic sequences, so we have not used it.
We removed sequence similarity of 117 mitochondrial and 162 cytosolic-AARSs by using CD-HIT software and created 40% non-redundant datasets. The positive (mitochondrial-AARS) and negative (cytosolic-AARS) datasets contain 59 and 41 protein sequences respectively.