Organising Strings
When cataloguing a collection of genetic strings, we should have an established system by which to organize them. The standard method is to organize strings as they would appear in a dictionary, so that “APPLE” precedes “APRON”, which in turn comes before “ARMOR”.
Problem
Assume that an alphabet 𝒜 has a predetermined order; that is, we write the alphabet as a permutation 𝒜 = (a1, a2, …, ak), where a1 < a2 < ⋯ < ak. For instance, the English alphabet is organized as (A, B, …, Z).
Given two strings s and t having the same length n, we say that s precedes t in the lexicographic order (and write s < Lex_t) if the first symbol s[j] that doesn’t match t[j] satisfies sj < tj in 𝒜.
Given
A collection of at most 10 symbols defining an ordered alphabet, and a positive integer n (n ≤ 10).
Return
All strings of length n that can be formed from the alphabet, ordered lexicographically (use the standard order of symbols in the English alphabet).
Solution
First, let’s import the tools and read the FASTA file:
import itertools
with open("rosalind_lexf.txt") as f:
data = f.read().split()
letters = data[:-1]
n = int(data[-1])
Then, we find the all possible strings of given length n:
perm = itertools.product(letters, repeat = n)
output = []
for i, j in enumerate(list(perm)):
permutation = ''
for item in j:
permutation += str(item)
output.append(permutation)
We need to sort the strings in the list alphabetically:
output.sort()
Finally, we print the result as in rosalind’s desired format:
for item in output:
print(item, end="\n")
Important note:
This problem is taken from rosalind.info. Please visit ROSALIND to find out more about Bioinformatics problems. You may also clink onto links for the definitions of each terminology.