forked from LoicGrobol/python-im
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update geneteurs notebook, add anacycliques notebook
- Loading branch information
1 parent
da5c727
commit 609872e
Showing
2 changed files
with
424 additions
and
43 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,377 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Anacycliques\n", | ||
"\n", | ||
"Trouver tous les anacycliques dans une liste de mots (issue de lexique.org par exemple). Les palindromes sont des cas particuliers d'anacycliques. \n", | ||
"Ex : 'été' est un palidrome \n", | ||
"'vu' et 'uv', 'tort' et 'trot' sont des anacycliques\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Tester si deux mots sont des anacycliques est très simple en Python. Pour cela on peut utiliser les slices avec le troisième argument pour le pas. \n", | ||
"`word[::-1]` : un pas de `-1` permet d'inverser la chaîne de caractères." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": true | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"def is_anacyclique(word1, word2):\n", | ||
" \"\"\"\n", | ||
" Returns True if the given words are palindromes\n", | ||
" False if not\n", | ||
" \"\"\"\n", | ||
" if word1 == word2[::-1]:\n", | ||
" return True\n", | ||
" else:\n", | ||
" return False" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"is_anacyclique(\"tort\", \"trot\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 19, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Nb de mots : 142641\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"words = []\n", | ||
"with open('lexique381.ortho', 'r') as f:\n", | ||
" for line in f:\n", | ||
" line = line.rstrip()\n", | ||
" if len(line) != 1:\n", | ||
" words.append(line)\n", | ||
"print(\"Nb de mots : {}\".format(len(words)))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Tester l'appartenance d'un élément à une liste de 142641 éléments est couteux en temps de traitement. \n", | ||
"Le temps du d'appartenance d'un élément à une liste est de 0(n) (voir [TimeComplexity](https://wiki.python.org/moin/TimeComplexity) sur le wiki Python). \n", | ||
"Ici comme la liste est déjà pas mal grande et que l'opération est répétée n fois, le temps de traitement est rhédibitoire." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Puisqu'on aime se compliquer la vie on peut essayer de réduite la taille de la liste en se limitant aux mots de même taille. Les anacycliques sont forcément de même taille." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"def words_by_length(words):\n", | ||
" \"\"\"\n", | ||
" Receives a list of words sorted by length, returns a list of words for each length\n", | ||
" \"\"\"\n", | ||
" for length in range(len(words[0]), len(words[-1]) + 1):\n", | ||
" yield [word for word in words if len(word) == length]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"res = set()\n", | ||
"sorted_words = sorted(words, key = lambda x:len(x))\n", | ||
"for words_n in words_by_length(sorted_words):\n", | ||
" for word in words_n:\n", | ||
" acyclique = word[::-1]\n", | ||
" if acyclique in words_n:\n", | ||
" res.add(tuple([word, acyclique]))\n", | ||
" \n", | ||
"for match in sorted(res):\n", | ||
" print(match)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"collapsed": true | ||
}, | ||
"source": [ | ||
"Malgré le découpage du lexique en mots de même taille et l'utilisation des générateurs, le temps de traitement reste important. \n", | ||
"Essayons avec un dictionnaire." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 12, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Nb de mots : 125627\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"from collections import defaultdict\n", | ||
"words_d = defaultdict()\n", | ||
"with open('lexique381.ortho', 'r') as f:\n", | ||
" for line in f:\n", | ||
" line = line.rstrip()\n", | ||
" if len(line) != 1:\n", | ||
" words_d[line] = \"\"\n", | ||
"print(\"Nb de mots : {}\".format(len(words_d.keys())))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"On a une différence de taille d'avec la liste parce que le dictionnaire supprime les doublons (125627 contre 142641). Mais ça ne change pas vraiment l'ordre du volume des données." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 15, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"('aa', 'aa')\n", | ||
"('ada', 'ada')\n", | ||
"('ados', 'soda')\n", | ||
"('adulé', 'éluda')\n", | ||
"('aga', 'aga')\n", | ||
"('agas', 'saga')\n", | ||
"('ah', 'ha')\n", | ||
"('ail', 'lia')\n", | ||
"('ailé', 'élia')\n", | ||
"('air', 'ria')\n", | ||
"('ako', 'oka')\n", | ||
"('alla', 'alla')\n", | ||
"('alloc', 'colla')\n", | ||
"('amis', 'sima')\n", | ||
"('an', 'na')\n", | ||
"('ana', 'ana')\n", | ||
"('anas', 'sana')\n", | ||
"('angor', 'rogna')\n", | ||
"('annoté', 'étonna')\n", | ||
"('ara', 'ara')\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"res = set()\n", | ||
"for word in words_d.keys():\n", | ||
" acyclique = word[::-1]\n", | ||
" if acyclique in words:\n", | ||
" res.add(tuple([word, acyclique]))\n", | ||
" \n", | ||
"for match in sorted(res)[:20]:\n", | ||
" print(match)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Là c'est tout de suite BEAUCOUP plus rapide <small>(supprimez `[:20]` pour avoir tous les résultats)</small>. Mais pourquoi ? \n", | ||
"Parce que le test d'appartenance pour un dictionnaire est estimé à O(1) en moyenne (O(n) dans le pire des cas). \n", | ||
"idem pour les ensembles (`set`)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 16, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Nb de mots : 125627\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"words_s = set()\n", | ||
"with open('lexique381.ortho', 'r') as f:\n", | ||
" for line in f:\n", | ||
" line = line.rstrip()\n", | ||
" if len(line) != 1:\n", | ||
" words_s.add(line)\n", | ||
"print(\"Nb de mots : {}\".format(len(words_s)))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 17, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"('aa', 'aa')\n", | ||
"('ada', 'ada')\n", | ||
"('ados', 'soda')\n", | ||
"('adulé', 'éluda')\n", | ||
"('aga', 'aga')\n", | ||
"('agas', 'saga')\n", | ||
"('ah', 'ha')\n", | ||
"('ail', 'lia')\n", | ||
"('ailé', 'élia')\n", | ||
"('air', 'ria')\n", | ||
"('ako', 'oka')\n", | ||
"('alla', 'alla')\n", | ||
"('alloc', 'colla')\n", | ||
"('amis', 'sima')\n", | ||
"('an', 'na')\n", | ||
"('ana', 'ana')\n", | ||
"('anas', 'sana')\n", | ||
"('angor', 'rogna')\n", | ||
"('annoté', 'étonna')\n", | ||
"('ara', 'ara')\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"res = set()\n", | ||
"for word in words_s:\n", | ||
" acyclique = word[::-1]\n", | ||
" if acyclique in words:\n", | ||
" res.add(tuple([word, acyclique]))\n", | ||
" \n", | ||
"for match in sorted(res)[:20]:\n", | ||
" print(match)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Allez on va s'amuser à mesurer ça :" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 25, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"CPU times: user 4 ms, sys: 0 ns, total: 4 ms\n", | ||
"Wall time: 3.02 ms\n", | ||
"CPU times: user 0 ns, sys: 0 ns, total: 0 ns\n", | ||
"Wall time: 7.39 µs\n", | ||
"CPU times: user 0 ns, sys: 0 ns, total: 0 ns\n", | ||
"Wall time: 8.82 µs\n" | ||
] | ||
}, | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"True" | ||
] | ||
}, | ||
"execution_count": 25, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"%time \"palindrome\" in words\n", | ||
"%time \"palindrome\" in words_d\n", | ||
"%time \"palindrome\" in words_s" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Si vous êtes intéressés par l'implémentation des dictionnaires vous pourrez lire l'article suivant : [Python dictionary implementation] (http://www.laurentluce.com/posts/python-dictionary-implementation/)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": true | ||
}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.5.2+" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 0 | ||
} |
Oops, something went wrong.