Skip to content

Commit

Permalink
Trained first GTP2
Browse files Browse the repository at this point in the history
  • Loading branch information
jramcast committed May 14, 2021
1 parent f742eac commit ced282d
Show file tree
Hide file tree
Showing 25 changed files with 3,183 additions and 170 deletions.
14 changes: 13 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -138,4 +138,16 @@ dmypy.json
cython_debug/

dataset.csv
model.torch
model.torch


# Extension
out/
node_modules/

# Editor
.vscode

# Training
.output
runs
59 changes: 59 additions & 0 deletions build_dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
import os
import re
from pathlib import Path
import pandas as pd
import random

from sklearn.utils import validation


TRAIN_PATH = "data/dataset_train.txt"
VALIDATION_PATH = "data/dataset_validation.txt"


def parse_sections(f):
sections = []
for line in f:
line = line.rstrip()

if (line.startswith("//")
or line.startswith("ifndef")
or line.startswith(":experiment")):
continue

if re.match(r"^=+ \w+", line):
sections.append(line)
else:
try:
sections[-1] += "\n" + line
except IndexError:
pass

return sections


sections = []

# Find adoc files
home = str(Path.home())
coursedir = os.path.join(home, "Desarrollo")

for dirpath, dnames, fnames in os.walk(coursedir):
for f in fnames:
if (f.endswith(".adoc") and
"guides" in dirpath and
"en-US" in dirpath):
filepath = os.path.join(dirpath, f)
print(filepath)
with open(filepath, "r") as f:
sections += parse_sections(f)

random.Random(42).shuffle(sections)
num_sections = len(sections)
train_size = int(num_sections * 0.8)

with open(TRAIN_PATH, "w") as f:
f.write("\n".join(sections[:train_size]))

with open(VALIDATION_PATH, "w") as f:
f.write("\n".join(sections[train_size:]))
3 changes: 3 additions & 0 deletions data/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*
*/
!.gitignore
58 changes: 0 additions & 58 deletions dataset.py

This file was deleted.

22 changes: 22 additions & 0 deletions extension/rht-text-generator/.eslintrc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"root": true,
"parser": "@typescript-eslint/parser",
"parserOptions": {
"ecmaVersion": 6,
"sourceType": "module"
},
"plugins": [
"@typescript-eslint"
],
"rules": {
"@typescript-eslint/naming-convention": "warn",
"@typescript-eslint/semi": "warn",
"curly": "warn",
"eqeqeq": "warn",
"no-throw-literal": "warn",
"semi": "off"
},
"ignorePatterns": [
"**/*.d.ts"
]
}
11 changes: 11 additions & 0 deletions extension/rht-text-generator/.vscodeignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
.vscode/**
.vscode-test/**
out/test/**
src/**
.gitignore
.yarnrc
vsc-extension-quickstart.md
**/tsconfig.json
**/.eslintrc.json
**/*.map
**/*.ts
9 changes: 9 additions & 0 deletions extension/rht-text-generator/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Change Log

All notable changes to the "rht-text-generator" extension will be documented in this file.

Check [Keep a Changelog](http://keepachangelog.com/) for recommendations on how to structure this file.

## [Unreleased]

- Initial release
70 changes: 70 additions & 0 deletions extension/rht-text-generator/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# rht-text-generator README

This is the README for your extension "rht-text-generator". After writing up a brief description, we recommend including the following sections.

## Features

Describe specific features of your extension including screenshots of your extension in action. Image paths are relative to this README file.

For example if there is an image subfolder under your extension project workspace:

\!\[feature X\]\(images/feature-x.png\)

> Tip: Many popular extensions utilize animations. This is an excellent way to show off your extension! We recommend short, focused animations that are easy to follow.
## Requirements

If you have any requirements or dependencies, add a section describing those and how to install and configure them.

## Extension Settings

Include if your extension adds any VS Code settings through the `contributes.configuration` extension point.

For example:

This extension contributes the following settings:

* `myExtension.enable`: enable/disable this extension
* `myExtension.thing`: set to `blah` to do something

## Known Issues

Calling out known issues can help limit users opening duplicate issues against your extension.

## Release Notes

Users appreciate release notes as you update your extension.

### 1.0.0

Initial release of ...

### 1.0.1

Fixed issue #.

### 1.1.0

Added features X, Y, and Z.

-----------------------------------------------------------------------------------------------------------
## Following extension guidelines

Ensure that you've read through the extensions guidelines and follow the best practices for creating your extension.

* [Extension Guidelines](https://code.visualstudio.com/api/references/extension-guidelines)

## Working with Markdown

**Note:** You can author your README using Visual Studio Code. Here are some useful editor keyboard shortcuts:

* Split the editor (`Cmd+\` on macOS or `Ctrl+\` on Windows and Linux)
* Toggle preview (`Shift+CMD+V` on macOS or `Shift+Ctrl+V` on Windows and Linux)
* Press `Ctrl+Space` (Windows, Linux) or `Cmd+Space` (macOS) to see a list of Markdown snippets

### For more information

* [Visual Studio Code's Markdown Support](http://code.visualstudio.com/docs/languages/markdown)
* [Markdown Syntax Reference](https://help.github.com/articles/markdown-basics/)

**Enjoy!**
Loading

0 comments on commit ced282d

Please sign in to comment.