Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

Commit a18541a

Browse files
pltrdyurvashik
authored andcommitted
fixing encoding issues on cnn/dailymail (#1)
1 parent fc1fcd9 commit a18541a

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

tensor2tensor/data_generators/cnn_dailymail.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ def generate_hash(inp):
102102

103103
urls = []
104104
for line in tf.gfile.Open(url_file):
105-
urls.append(line.strip())
105+
urls.append(line.strip().encode('utf-8'))
106106

107107
filelist = []
108108
for url in urls:
@@ -132,7 +132,7 @@ def fix_run_on_sents(line):
132132
story = []
133133
summary = []
134134
reading_highlights = False
135-
for line in tf.gfile.Open(story_file):
135+
for line in tf.gfile.Open(story_file, "rb"):
136136
line = unicode(line.strip(), "utf-8") if six.PY2 else line.strip().decode("utf-8")
137137
line = fix_run_on_sents(line)
138138
if line == "":

0 commit comments

Comments
 (0)