# Enumerating k-mers Lexicographically

Organising Strings

When cataloguing a collection of genetic strings, we should have an established system by which to organize them. The standard method is to organize strings as they would appear in a dictionary, so that “APPLE” precedes “APRON”, which in turn comes before “ARMOR”.

#### Problem

Assume that an alphabet 𝒜 has a predetermined order; that is, we write the alphabet as a permutation 𝒜 = (a1, a2, …, ak), where a1 < a2 < ⋯ < ak. For instance, the English alphabet is organized as (A, B, …, Z).

Given two strings s and t having the same length n, we say that s precedes t in the lexicographic order (and write s < Lex_t) if the first symbol s[j] that doesn’t match t[j] satisfies sj < tj in 𝒜.

##### Given

A collection of at most 10 symbols defining an ordered alphabet, and a positive integer n (n ≤ 10).

##### Return

All strings of length n that can be formed from the alphabet, ordered lexicographically (use the standard order of symbols in the English alphabet).

#### Solution

First, let’s import the tools and read the FASTA file:

```import itertools

with open("rosalind_lexf.txt") as f:
letters = data[:-1]
n = int(data[-1])
```

Then, we find the all possible strings of given length n:

```perm = itertools.product(letters, repeat = n)
output = []
for i, j in enumerate(list(perm)):
permutation = ''
for item in j:
permutation += str(item)
output.append(permutation)
```

We need to sort the strings in the list alphabetically:

```output.sort()
```

Finally, we print the result as in rosalind’s desired format:

```for item in output:
print(item, end="\n")
```
###### Important note:

This problem is taken from rosalind.info. Please visit ROSALIND to find out more about Bioinformatics problems. You may also clink onto links for the definitions of each terminology.