-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
- Loading branch information
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,31 @@ | ||
# nwodkram | ||
written by JavaScript. Convert HTML to Markdown. | ||
# mdReserve | ||
将HTML文本转换为Markdown格式文本。由JavaScript编写 | ||
|
||
[Demo](https://abc1310054026.github.io/md-reserve/) | ||
|
||
* [开发进度](./development.md) | ||
* [English Version](./README_EN.md) | ||
## 安装 | ||
npm: | ||
``` | ||
npm install md-reserve | ||
``` | ||
## 用法 | ||
###### ES6 Modules | ||
```javascript | ||
import {MdReverse} from 'md-reverse'; | ||
|
||
const mdReserve = new MdReverse(); | ||
mdReserve.toMarkdown(`<h1>Hello World!</h1>`); | ||
``` | ||
###### ES5 | ||
```javascript | ||
var MR = require('md-reverse'); | ||
var mdReserve = new MR.MdReverse(); | ||
mdReserve.toMarkdown(`<h1>Hello World!</h1>`); | ||
``` | ||
## 提示 | ||
* 因为各个网站的网页结构千奇百怪,做不到匹配所有网站,建议转换的HTML文本本身就是由Markdown编写的或符合Markdown规范。 | ||
在转换非Markdown规范的HTML文本可能准确性会大大下降。 | ||
* 目前仅能支持Markdown的[基本语法](https://www.markdownguide.org/basic-syntax)。[扩展语法](https://www.markdownguide.org/extended-syntax)部分会在之后陆续完成。 | ||
* 为了提高转换的准确性,对于无法识别的HTML标签,`md-reserve`会将其删除。这个问题在扩展语法部分完成后会尝试解决。 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# mdReserve | ||
Convert HTML to Markdown. Written by JavaScript. | ||
|
||
[Demo](https://abc1310054026.github.io/md-reserve/) | ||
|
||
* [development schedule](./development.md) | ||
* [For more information see Chinese](./README_EN.md) | ||
## Installation | ||
npm: | ||
``` | ||
npm install md-reserve | ||
``` | ||
## Usage | ||
###### ES6 Modules | ||
```javascript | ||
import {MdReverse} from 'md-reverse'; | ||
|
||
const mdReserve = new MdReverse(); | ||
mdReserve.toMarkdown(`<h1>Hello World!</h1>`); | ||
``` | ||
###### ES5 | ||
```javascript | ||
var MR = require('md-reverse'); | ||
var mdReserve = new MR.MdReverse(); | ||
mdReserve.toMarkdown(`<h1>Hello World!</h1>`); | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
const presets = [ | ||
[ | ||
"@babel/preset-env", | ||
{ | ||
// targets: { | ||
// edge: "17", | ||
// firefox: "60", | ||
// chrome: "67", | ||
// safari: "11.1", | ||
// }, | ||
useBuiltIns: "usage", | ||
corejs: 3 | ||
}, | ||
], | ||
]; | ||
|
||
module.exports = { presets }; |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
// import {MdReserve} from 'md-reserve'; | ||
var MR = require('md-reverse'); | ||
// css file | ||
import './styles.css'; | ||
|
||
const mdReserve = new MR.MdReverse(); | ||
|
||
const htmlArea = document.getElementById('html-area'); | ||
const mdArea = document.getElementById('markdown-area'); | ||
|
||
let md; | ||
htmlArea.addEventListener('input', (ev) => { | ||
md = mdReserve.toMarkdown(htmlArea.value); | ||
mdArea.value = md; | ||
}); |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
# 实现过程 | ||
## 词法分析 | ||
1. 接收原始HTML文本 | ||
2. 分割HTML元素:`opening tag` `enclosed text content` `closing tag`([按照HTML元素的语法](https://developer.mozilla.org/zh-CN/docs/Glossary/HTML)) | ||
分割结果为数组,按分割顺序储存。 | ||
```html | ||
<div> | ||
<p>Hello</p> | ||
<p><a href="http://github.com">github</a></p> | ||
</div> | ||
``` | ||
**result**: `['<div>', '<p>', 'Hello', '</p>', '<p>', '<a href="http://github.com">', 'github', '</a>', '</p>']` | ||
3. 输出结果 | ||
## 语法分析 | ||
1. 将语法分析的结果`result`转换为`Token`数组 | ||
```typescript | ||
// Token定义 | ||
let Token: { | ||
tag: string, // token的HTML类型 | ||
type: number, // token的数字类型 | ||
attribute: { // 元素的属性名和属性值的集合 | ||
[key: string]: string | ||
}, | ||
position: number // Token是开标签还是闭标签(开标签: 1, 闭标签: 2, 空元素: 3) | ||
} | ||
``` | ||
**Token(转换结果示例如下)**: | ||
```js | ||
let Token = [ | ||
{ | ||
tag: 'div', | ||
type: 1, | ||
attribute: null, | ||
position: 1 | ||
}, | ||
... | ||
] | ||
``` | ||
## 中间代码生成(Virtual DOM Tree) | ||
1. 定义各个HTML元素类型对应的**标识符(type(numer类型)),允许的子元素类型,允许保留的属性**。 | ||
2. 接收词法分析的结果,将其转换成类似`DOM Tree`树形结构。 | ||
树节点类型如下: | ||
```typescript | ||
// Node定义 | ||
let Node: { | ||
type: number, // token类型 | ||
attribute: { // 元素的属性名和属性值的集合 | ||
[key: string]: string | ||
}, | ||
content: string | Array, // token内容,文本节点为string,非空元素如有子节点为数组,空元素为null | ||
isHTML: boolean, // 标识是否显示为HTML内容 | ||
isCode: boolean // 标识是否显示为计算机代码 | ||
} | ||
``` | ||
3. 遍历树,确保结构正确。 | ||
4. 输出结果 | ||
## 目标代码生成(Markdown Text) | ||
1. 定义各个HTML元素类型对应的**开标签转换规则,闭标签转换规则**。 | ||
2. 接收中间代码。 | ||
3. 按照开闭标签的转换规则生成目标代码。 | ||
# to-do List | ||
|
||
- [x] 词法分析 | ||
- [x] 语法分析 | ||
- [x] 中间代码生成 | ||
- [x] 目标代码生成 | ||
- [ ] 完成插件模块 | ||
- [ ] Markdown扩展语法 | ||
- [ ] Tables | ||
- [ ] Code Syntax Highlighting | ||
- [ ] Footnotes | ||
- [ ] Heading IDs | ||
- [ ] Definition Lists | ||
- [ ] Strikethrough | ||
- [ ] Task Lists | ||
- [ ] Automatic URL Linking |
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
<!DOCTYPE html> | ||
<html lang="en"> | ||
<head> | ||
<meta charset="UTF-8"> | ||
<title>Title</title> | ||
</head> | ||
<body> | ||
|
||
<script type="text/javascript" src="demo.js"></script></body> | ||
</html> |
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
"use strict"; | ||
|
||
require("core-js/modules/es.array.for-each"); | ||
|
||
require("core-js/modules/es.object.define-property"); | ||
|
||
require("core-js/modules/es.object.keys"); | ||
|
||
require("core-js/modules/web.dom-collections.for-each"); | ||
|
||
Object.defineProperty(exports, "__esModule", { | ||
value: true | ||
}); | ||
|
||
var _mdReverse = require("./lib/mdReverse"); | ||
|
||
Object.keys(_mdReverse).forEach(function (key) { | ||
if (key === "default" || key === "__esModule") return; | ||
Object.defineProperty(exports, key, { | ||
enumerable: true, | ||
get: function get() { | ||
return _mdReverse[key]; | ||
} | ||
}); | ||
}); |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
"use strict"; | ||
|
||
require("core-js/modules/es.array.slice"); | ||
|
||
require("core-js/modules/es.object.define-properties"); | ||
|
||
require("core-js/modules/es.object.define-property"); | ||
|
||
require("core-js/modules/es.string.trim"); | ||
|
||
Object.defineProperty(exports, "__esModule", { | ||
value: true | ||
}); | ||
exports.Lexer = Lexer; | ||
|
||
var _tools = require("./tools"); | ||
|
||
function Lexer() {} | ||
|
||
Object.defineProperties(Lexer.prototype, { | ||
analysis: { | ||
value: analysis | ||
} | ||
}); | ||
|
||
function analysis(str) { | ||
str = _tools.Tools.trim(str); | ||
console.time('lexical analysis'); | ||
var result = []; | ||
|
||
var stack = [], | ||
_char, | ||
start = 0, | ||
end = 0; | ||
|
||
for (var i = 0, len = str.length; i < len; i++) { | ||
_char = str[i]; | ||
|
||
switch (_char) { | ||
case '<': | ||
if (stack[stack.length - 1] !== '"') { | ||
stack.push(_char); | ||
start = i; | ||
if (start - end > 1 && !_tools.Tools.isEmpty(str, end + 1, start - 1)) result.push(str.slice(end + 1, start)); | ||
} | ||
|
||
break; | ||
|
||
case '>': | ||
if (stack[stack.length - 1] === '<') { | ||
stack.pop(); | ||
end = i; | ||
result.push(str.slice(start, end + 1)); | ||
} | ||
|
||
break; | ||
|
||
case '"': | ||
if (stack[stack.length - 1] === '"') { | ||
stack.pop(); | ||
} else if (stack[stack.length - 1] === '<') { | ||
stack.push(_char); | ||
} | ||
|
||
} | ||
} | ||
|
||
console.timeEnd('lexical analysis'); | ||
return result; | ||
} |