diff --git a/anacycliques.ipynb b/anacycliques.ipynb
new file mode 100644
index 0000000..781334a
--- /dev/null
+++ b/anacycliques.ipynb
@@ -0,0 +1,377 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Anacycliques\n",
+ "\n",
+ "Trouver tous les anacycliques dans une liste de mots (issue de lexique.org par exemple). Les palindromes sont des cas particuliers d'anacycliques. \n",
+ "Ex : 'été' est un palidrome \n",
+ "'vu' et 'uv', 'tort' et 'trot' sont des anacycliques\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Tester si deux mots sont des anacycliques est très simple en Python. Pour cela on peut utiliser les slices avec le troisième argument pour le pas. \n",
+ "`word[::-1]` : un pas de `-1` permet d'inverser la chaîne de caractères."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": true
+ },
+ "outputs": [],
+ "source": [
+ "def is_anacyclique(word1, word2):\n",
+ " \"\"\"\n",
+ " Returns True if the given words are palindromes\n",
+ " False if not\n",
+ " \"\"\"\n",
+ " if word1 == word2[::-1]:\n",
+ " return True\n",
+ " else:\n",
+ " return False"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "is_anacyclique(\"tort\", \"trot\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Nb de mots : 142641\n"
+ ]
+ }
+ ],
+ "source": [
+ "words = []\n",
+ "with open('lexique381.ortho', 'r') as f:\n",
+ " for line in f:\n",
+ " line = line.rstrip()\n",
+ " if len(line) != 1:\n",
+ " words.append(line)\n",
+ "print(\"Nb de mots : {}\".format(len(words)))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Tester l'appartenance d'un élément à une liste de 142641 éléments est couteux en temps de traitement. \n",
+ "Le temps du d'appartenance d'un élément à une liste est de 0(n) (voir [TimeComplexity](https://wiki.python.org/moin/TimeComplexity) sur le wiki Python). \n",
+ "Ici comme la liste est déjà pas mal grande et que l'opération est répétée n fois, le temps de traitement est rhédibitoire."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Puisqu'on aime se compliquer la vie on peut essayer de réduite la taille de la liste en se limitant aux mots de même taille. Les anacycliques sont forcément de même taille."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "def words_by_length(words):\n",
+ " \"\"\"\n",
+ " Receives a list of words sorted by length, returns a list of words for each length\n",
+ " \"\"\"\n",
+ " for length in range(len(words[0]), len(words[-1]) + 1):\n",
+ " yield [word for word in words if len(word) == length]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [],
+ "source": [
+ "res = set()\n",
+ "sorted_words = sorted(words, key = lambda x:len(x))\n",
+ "for words_n in words_by_length(sorted_words):\n",
+ " for word in words_n:\n",
+ " acyclique = word[::-1]\n",
+ " if acyclique in words_n:\n",
+ " res.add(tuple([word, acyclique]))\n",
+ " \n",
+ "for match in sorted(res):\n",
+ " print(match)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": true
+ },
+ "source": [
+ "Malgré le découpage du lexique en mots de même taille et l'utilisation des générateurs, le temps de traitement reste important. \n",
+ "Essayons avec un dictionnaire."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Nb de mots : 125627\n"
+ ]
+ }
+ ],
+ "source": [
+ "from collections import defaultdict\n",
+ "words_d = defaultdict()\n",
+ "with open('lexique381.ortho', 'r') as f:\n",
+ " for line in f:\n",
+ " line = line.rstrip()\n",
+ " if len(line) != 1:\n",
+ " words_d[line] = \"\"\n",
+ "print(\"Nb de mots : {}\".format(len(words_d.keys())))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "On a une différence de taille d'avec la liste parce que le dictionnaire supprime les doublons (125627 contre 142641). Mais ça ne change pas vraiment l'ordre du volume des données."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "('aa', 'aa')\n",
+ "('ada', 'ada')\n",
+ "('ados', 'soda')\n",
+ "('adulé', 'éluda')\n",
+ "('aga', 'aga')\n",
+ "('agas', 'saga')\n",
+ "('ah', 'ha')\n",
+ "('ail', 'lia')\n",
+ "('ailé', 'élia')\n",
+ "('air', 'ria')\n",
+ "('ako', 'oka')\n",
+ "('alla', 'alla')\n",
+ "('alloc', 'colla')\n",
+ "('amis', 'sima')\n",
+ "('an', 'na')\n",
+ "('ana', 'ana')\n",
+ "('anas', 'sana')\n",
+ "('angor', 'rogna')\n",
+ "('annoté', 'étonna')\n",
+ "('ara', 'ara')\n"
+ ]
+ }
+ ],
+ "source": [
+ "res = set()\n",
+ "for word in words_d.keys():\n",
+ " acyclique = word[::-1]\n",
+ " if acyclique in words:\n",
+ " res.add(tuple([word, acyclique]))\n",
+ " \n",
+ "for match in sorted(res)[:20]:\n",
+ " print(match)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Là c'est tout de suite BEAUCOUP plus rapide (supprimez `[:20]` pour avoir tous les résultats). Mais pourquoi ? \n",
+ "Parce que le test d'appartenance pour un dictionnaire est estimé à O(1) en moyenne (O(n) dans le pire des cas). \n",
+ "idem pour les ensembles (`set`)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Nb de mots : 125627\n"
+ ]
+ }
+ ],
+ "source": [
+ "words_s = set()\n",
+ "with open('lexique381.ortho', 'r') as f:\n",
+ " for line in f:\n",
+ " line = line.rstrip()\n",
+ " if len(line) != 1:\n",
+ " words_s.add(line)\n",
+ "print(\"Nb de mots : {}\".format(len(words_s)))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "('aa', 'aa')\n",
+ "('ada', 'ada')\n",
+ "('ados', 'soda')\n",
+ "('adulé', 'éluda')\n",
+ "('aga', 'aga')\n",
+ "('agas', 'saga')\n",
+ "('ah', 'ha')\n",
+ "('ail', 'lia')\n",
+ "('ailé', 'élia')\n",
+ "('air', 'ria')\n",
+ "('ako', 'oka')\n",
+ "('alla', 'alla')\n",
+ "('alloc', 'colla')\n",
+ "('amis', 'sima')\n",
+ "('an', 'na')\n",
+ "('ana', 'ana')\n",
+ "('anas', 'sana')\n",
+ "('angor', 'rogna')\n",
+ "('annoté', 'étonna')\n",
+ "('ara', 'ara')\n"
+ ]
+ }
+ ],
+ "source": [
+ "res = set()\n",
+ "for word in words_s:\n",
+ " acyclique = word[::-1]\n",
+ " if acyclique in words:\n",
+ " res.add(tuple([word, acyclique]))\n",
+ " \n",
+ "for match in sorted(res)[:20]:\n",
+ " print(match)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Allez on va s'amuser à mesurer ça :"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "CPU times: user 4 ms, sys: 0 ns, total: 4 ms\n",
+ "Wall time: 3.02 ms\n",
+ "CPU times: user 0 ns, sys: 0 ns, total: 0 ns\n",
+ "Wall time: 7.39 µs\n",
+ "CPU times: user 0 ns, sys: 0 ns, total: 0 ns\n",
+ "Wall time: 8.82 µs\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "True"
+ ]
+ },
+ "execution_count": 25,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "%time \"palindrome\" in words\n",
+ "%time \"palindrome\" in words_d\n",
+ "%time \"palindrome\" in words_s"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Si vous êtes intéressés par l'implémentation des dictionnaires vous pourrez lire l'article suivant : [Python dictionary implementation] (http://www.laurentluce.com/posts/python-dictionary-implementation/)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": true
+ },
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.5.2+"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/generateur.ipynb b/generateur.ipynb
index 85ce87b..0321d14 100644
--- a/generateur.ipynb
+++ b/generateur.ipynb
@@ -23,7 +23,7 @@
},
{
"cell_type": "code",
- "execution_count": 9,
+ "execution_count": 45,
"metadata": {
"collapsed": true
},
@@ -43,7 +43,7 @@
},
{
"cell_type": "code",
- "execution_count": 10,
+ "execution_count": 34,
"metadata": {
"collapsed": false
},
@@ -73,22 +73,11 @@
},
{
"cell_type": "code",
- "execution_count": 11,
+ "execution_count": null,
"metadata": {
"collapsed": false
},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "CPU times: user 0 ns, sys: 0 ns, total: 0 ns\n",
- "Wall time: 11.4 µs\n",
- "CPU times: user 596 ms, sys: 4 ms, total: 600 ms\n",
- "Wall time: 606 ms\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
"%time mots_a = with_a(mots)\n",
"mots_big = mots * 1000000\n",
@@ -108,7 +97,7 @@
},
{
"cell_type": "code",
- "execution_count": 1,
+ "execution_count": 46,
"metadata": {
"collapsed": true
},
@@ -125,7 +114,7 @@
},
{
"cell_type": "code",
- "execution_count": 12,
+ "execution_count": 47,
"metadata": {
"collapsed": false
},
@@ -134,15 +123,15 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "CPU times: user 0 ns, sys: 4 ms, total: 4 ms\n",
- "Wall time: 7.09 ms\n",
"CPU times: user 0 ns, sys: 0 ns, total: 0 ns\n",
- "Wall time: 13.8 µs\n"
+ "Wall time: 76.8 µs\n",
+ "CPU times: user 0 ns, sys: 0 ns, total: 0 ns\n",
+ "Wall time: 8.82 µs\n"
]
}
],
"source": [
- "mots_big = mots * 1000\n",
+ "mots_big = mots * 100\n",
"%time mots_a = with_a(mots_big)\n",
"%time mots_a_gen = gen_with_a(mots_big)"
]
@@ -157,7 +146,7 @@
},
{
"cell_type": "code",
- "execution_count": 19,
+ "execution_count": 48,
"metadata": {
"collapsed": false
},
@@ -167,13 +156,18 @@
"output_type": "stream",
"text": [
"mots_a is a \n",
- "mots_a_gen is a \n"
+ "mots_a_gen is a \n",
+ "Taille de mots_a : 1672\n",
+ "Taille de mots_a_gen : 88\n"
]
}
],
"source": [
"print(\"mots_a is a {}\".format(type(mots_a)))\n",
- "print(\"mots_a_gen is a {}\".format(type(mots_a_gen)))"
+ "print(\"mots_a_gen is a {}\".format(type(mots_a_gen)))\n",
+ "import sys\n",
+ "print(\"Taille de mots_a : {}\".format(sys.getsizeof(mots_a)))\n",
+ "print(\"Taille de mots_a_gen : {}\".format(sys.getsizeof(mots_a_gen)))"
]
},
{
@@ -188,7 +182,7 @@
},
{
"cell_type": "code",
- "execution_count": 6,
+ "execution_count": 18,
"metadata": {
"collapsed": false
},
@@ -197,8 +191,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "CPU times: user 0 ns, sys: 0 ns, total: 0 ns\n",
- "Wall time: 568 µs\n"
+ "CPU times: user 4 ms, sys: 0 ns, total: 4 ms\n",
+ "Wall time: 5.68 ms\n"
]
}
],
@@ -224,7 +218,7 @@
},
{
"cell_type": "code",
- "execution_count": 22,
+ "execution_count": 24,
"metadata": {
"collapsed": false
},
@@ -232,16 +226,38 @@
{
"data": {
"text/plain": [
- " at 0x7f0eac423db0>"
+ "['chat', 'matin']"
]
},
- "execution_count": 22,
+ "execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
- "(mot for mots in mots if 'a' in mot)"
+ "[mot for mot in mots if 'a' in mot]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {
+ "collapsed": false
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ " at 0x7fd2a44637d8>"
+ ]
+ },
+ "execution_count": 25,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "(mot for mot in mots if 'a' in mot)"
]
},
{
@@ -259,18 +275,6 @@
"display_name": "Python 3",
"language": "python",
"name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.5.2+"
}
},
"nbformat": 4,