Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Adds language option #57

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .zuul.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,6 @@ browsers:
browserify:
- transform: babelify
scripts:
- "build/browser/transliteration.min.js"
- "lib/browser/transliteration.min.js"
username: node-transliteration
key: 45c0ecd6-f97f-480f-a050-1f35e24e732b
76 changes: 72 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,24 +102,87 @@ __Options:__ (optional)
replace: { source1: target1, source2: target2, ... }, // Object form of argument
replace: [[source1, target1], [source2, target2], ... ], // Array form of argument
/* Strings in the ignore list will be bypassed from transliteration */
ignore: [str1, str2] // default: []
ignore: [str1, str2], // default: [],
/* Source language (allows using custom rules for specific languages) */
lang: 'bg' // default: ''
}
```

__transliterate.config([optionsObj])__

Bind options globally so any following calls will be using `optoinsObj` by default. If `optionsObj` argument is omitted, it will return current default option object.
Bind options globally so any following calls will be using `optionsObj` by default. If `optionsObj` argument is omitted, it will return current default option object.
```javascript
transliterate.config({ replace: [['你好', 'Hello']] });
transliterate('你好, world!'); // Result: 'Hello, world!'. This equals transliterate('你好, world!', { replace: [['你好', 'Hello']] });
```

__transliterate.setLanguagesConfig([languagesConfig])__

You can set a custom languages configuration object in the form of:
```js
{
'lang-name': {
replace: [['a', 'b']],
ignore: ['c']
},
'lang-name2': {
replace: [['c', 'd']]
}
}
```

The need for having control over the source language is brought by the fact that the same characters can have a different transliteration output in different languages. For example, in Russian the character `щ` has a different pronounciation than the same character in the Bulgarian alphabet, thus the difference in transliteration - `shch` versus `sht`.

Sample configuration for the Bulgarian language:
```javascript
transliterate.setLanguagesConfig({
bg: {
replace: [
['ц', 'ts'],
['Ц', 'Ts'],
['щ', 'sht'],
['Щ', 'Sht'],
['ъ', 'a'],
['Ъ', 'A'],
['ь', 'y'],
['ѝ', 'i']
]
}
});

transliterate('щастие'); // 'shchastie'
transliterate('щастие', { lang: 'bg' }); // 'shtastie'
```

Currently the following Cyrillic languages are supported:
- Russian (used by default if you don't specify a language)
- Bulgarian `'bg'`
- Macedonian `'mk'`
- Ukrainian `'ua'`
- Serbian `'rs'`

If you want to specify your own language rules but still use those provided in the library, you can use something like [deep-assign](https://www.npmjs.com/package/deep-assign) to extend the built-ins:

```javascript
const builtInConfig = transliterate.setLanguagesConfig();
transliterate.setLanguagesConfig(
deepAssign(
builtInConfig, {
'my-lang': {
replace: [['a', 'b']]
}
})
);
```

__Example__
```javascript
import { transliterate as tr } from 'transliteration';
tr('你好,世界'); // Ni Hao , Shi Jie
tr('Γεια σας, τον κόσμο'); // Geia sas, ton kosmo
tr('안녕하세요, 세계'); // annyeonghaseyo, segye
tr('цвете'); // 'cvete'
tr('цвете', { lang: 'bg' }); // 'tsvete'
tr('你好,世界', { replace: {你: 'You'}, ignore: ['好'] }) // You 好, Shi Jie
tr('你好,世界', { replace: [['你', 'You']], ignore: ['好'] }) // You 好, Shi Jie (option in array form)
// or use configurations
Expand All @@ -143,14 +206,16 @@ __Options:__ (optional)
replace: { source1: target1, source2: target2, ... },
replace: [[source1, target1], [source2, target2], ... ], // default: []
/* Strings in the ignore list will be bypassed from transliteration */
ignore: [str1, str2] // default: []
ignore: [str1, str2], // default: []
/* Source language */
lang: 'bg' // default: ''
}
```
If `options` is not provided, it will use the above default values.

__slugify.config([optionsObj])__

Bind options globally so any following calls will be using `optoinsObj` by default. If `optionsObj` argument is omitted, it will return current default option object.
Bind options globally so any following calls will be using `optionsObj` by default. If `optionsObj` argument is omitted, it will return current default option object.
```javascript
slugify.config({ replace: [['你好', 'Hello']] });
slugify('你好, world!'); // Result: 'hello-world'. This equals slugify('你好, world!', { replace: [['你好', 'Hello']] });
Expand All @@ -163,6 +228,7 @@ slugify('你好,世界'); // ni-hao-shi-jie
slugify('你好,世界', { lowercase: false, separator: '_' }); // Ni_Hao_Shi_Jie
slugify('你好,世界', { replace: {你好: 'Hello', 世界: 'world'}, separator: '_' }); // hello_world
slugify('你好,世界', { replace: [['你好', 'Hello'], ['世界', 'world']], separator: '_' }); // hello_world (option in array form)
slugify('Цветя и щастие.', { lang: 'bg'}); // 'tsvetya-i-shtastie'
slugify('你好,世界', { ignore: ['你好'] }); // 你好shi-jie
// or use configurations
slugify.config({ lowercase: false, separator: '_' });
Expand Down Expand Up @@ -203,6 +269,7 @@ Options:
-u, --unknown Placeholder for unknown characters [string] [default: "[?]"]
-r, --replace Custom string replacement [array] [default: []]
-i, --ignore String list to ignore [array] [default: []]
-l, --lang Source language [string] [default: ""]
-S, --stdin Use stdin as input [boolean] [default: false]
-h, --help Show help [boolean]

Expand All @@ -225,6 +292,7 @@ Options:
-s, --separator Separator of the slug [string] [default: "-"]
-r, --replace Custom string replacement [array] [default: []]
-i, --ignore String list to ignore [array] [default: []]
--lang Source language [string] [default: '']
-S, --stdin Use stdin as input [boolean] [default: false]
-h, --help Show help [boolean]

Expand Down
97 changes: 97 additions & 0 deletions data/languages-config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
module.exports = {
/**
* Bulgarian language
* https://en.wikipedia.org/wiki/Romanization_of_Bulgarian#Comparison_table
* Standard: BGN/PCGN (2013)
*/
bg: {
replace: [
['ц', 'ts'],
['Ц', 'Ts'],
['щ', 'sht'],
['Щ', 'Sht'],
['ъ', 'a'],
['Ъ', 'A'],
['ь', 'y'],
['ѝ', 'i'],
],
},
/**
* Macedonian language
* https://en.wikipedia.org/wiki/Romanization_of_Macedonian#ISO_9_system
* Standard: Official Documents/Cadastre
*/
mk: {
replace: [
['Ѓ', 'Gj'],
['ѓ', 'gj'],
['Ѕ', 'Dz'],
['s', 'dz'],
['Љ', 'Lj'],
['љ', 'lj'],
['Њ', 'Nj'],
['њ', 'nj'],
['є', 'ie'],
['Є', 'Ie'],
['Ќ', 'Kj'],
['ќ', 'kj'],
['Џ', 'Dj'],
['џ', 'dj'],
['ц', 'ts'],
['Ц', 'Ts'],
['щ', 'shch'],
['Щ', 'Shch'],
['И', 'Y'],
['и', 'y'],
],
},
/**
* Ukrainian language
* https://en.wikipedia.org/wiki/Romanization_of_Ukrainian
* Standard: Passport 2007
*/
ua: {
replace: [
['зг', 'zgh'],
['Зг', 'Zgh'],
['ґ', 'g'],
['Ґ', 'G'],
['є', 'ie'],
['Є', 'Ie'],
['ї', 'i'],
['Ї', 'I'],
['и', 'y'],
['И', 'Y'],
['х', 'kh'],
['Х', 'Kh'],
['ц', 'ts'],
['Ц', 'Ts'],
['щ', 'shch'],
['Щ', 'Shch'],
['ю', 'iu'],
['Ю', 'Iu'],
['я', 'ia'],
['Я', 'Ia'],
],
},
/**
* Serbian language
* https://en.wikipedia.org/wiki/Romanization_of_Serbian
*/
rs: {
replace: [
['Ђ', 'Dj'],
['ђ', 'dj'],
['Љ', 'Lj'],
['љ', 'lj'],
['Њ', 'Nj'],
['њ', 'nj'],
['Ћ', 'Ch'],
['ћ', 'ch'],
['Ч', 'Ch'],
['ч', 'ch'],
['Џ', 'Dj'],
['џ', 'dj'],
],
},
};
1 change: 1 addition & 0 deletions gulpfile.babel.js
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
/* eslint-disable function-paren-newline */
import browserify from 'browserify';
import gulp from 'gulp';
import source from 'vinyl-source-stream';
Expand Down
14 changes: 11 additions & 3 deletions lib/bin/slugify
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,17 @@ function _interopRequireDefault(obj) { return obj && obj.__esModule ? obj : { de

const STDIN_ENCODING = 'utf-8'; // eslint-disable-line import/no-unresolved

/* eslint-disable no-console, function-paren-newline */

const options = {
lowercase: true,
separator: '-',
replace: [],
ignore: []
ignore: [],
lang: ''
};

const argv = _yargs2.default.version().usage('Usage: $0 <unicode> [options]').option('l', {
const { argv } = _yargs2.default.version().usage('Usage: $0 <unicode> [options]').option('l', {
alias: 'lowercase',
default: options.lowercase,
describe: 'Use lowercase',
Expand All @@ -44,14 +47,18 @@ const argv = _yargs2.default.version().usage('Usage: $0 <unicode> [options]').op
default: options.ignore,
describe: 'String list to ignore',
type: 'array'
}).option('lang', {
default: options.lang,
describe: 'Source language',
type: 'string'
}).option('S', {
alias: 'stdin',
default: false,
describe: 'Use stdin as input',
type: 'boolean'
}).help('h').option('h', {
alias: 'help'
}).example('$0 "你好, world!" -r 好=good -r "world=Shi Jie"', 'Replace `,` into `!` and `world` into `shijie`.\nResult: ni-good-shi-jie').example('$0 "你好,世界!" -i 你好 -i ,', 'Ignore `你好` and `,`.\nResult: 你好,shi-jie').wrap(100).argv;
}).example('$0 "你好, world!" -r 好=good -r "world=Shi Jie"', 'Replace `,` into `!` and `world` into `shijie`.\nResult: ni-good-shi-jie').example('$0 "你好,世界!" -i 你好 -i ,', 'Ignore `你好` and `,`.\nResult: 你好,shi-jie').wrap(100);

options.lowercase = !!argv.l;
options.separator = argv.separator;
Expand All @@ -66,6 +73,7 @@ if (argv.replace.length) {
}
}
options.ignore = argv.ignore;
options.lang = argv.lang;

if (argv.stdin) {
process.stdin.setEncoding(STDIN_ENCODING);
Expand Down
15 changes: 12 additions & 3 deletions lib/bin/transliterate
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,16 @@ function _interopRequireDefault(obj) { return obj && obj.__esModule ? obj : { de

const STDIN_ENCODING = 'utf-8'; // eslint-disable-line import/no-unresolved

/* eslint-disable no-console, function-paren-newline */

const options = {
unknown: '[?]',
replace: [],
ignore: []
ignore: [],
lang: ''
};

const argv = _yargs2.default.version().usage('Usage: $0 <unicode> [options]').option('u', {
const { argv } = _yargs2.default.version().usage('Usage: $0 <unicode> [options]').option('u', {
alias: 'unknown',
default: options.unknown,
describe: 'Placeholder for unknown characters',
Expand All @@ -36,14 +39,19 @@ const argv = _yargs2.default.version().usage('Usage: $0 <unicode> [options]').op
default: options.ignore,
describe: 'String list to ignore',
type: 'array'
}).option('l', {
alias: 'lang',
default: options.lang,
describe: 'Source language',
type: 'string'
}).option('S', {
alias: 'stdin',
default: false,
describe: 'Use stdin as input',
type: 'boolean'
}).help('h').option('h', {
alias: 'help'
}).example('$0 "你好, world!" -r 好=good -r "world=Shi Jie"', 'Replace `,` into `!`, `world` into `shijie`.\nResult: Ni good, Shi Jie!').example('$0 "你好,世界!" -i 你好 -i ,', 'Ignore `你好` and `,`.\nResult: 你好,Shi Jie !').wrap(100).argv;
}).example('$0 "你好, world!" -r 好=good -r "world=Shi Jie"', 'Replace `,` into `!`, `world` into `shijie`.\nResult: Ni good, Shi Jie!').example('$0 "你好,世界!" -i 你好 -i ,', 'Ignore `你好` and `,`.\nResult: 你好,Shi Jie !').wrap(100);

options.unknown = argv.u;
if (argv.replace.length) {
Expand All @@ -57,6 +65,7 @@ if (argv.replace.length) {
}
}
options.ignore = argv.ignore;
options.lang = argv.lang;

if (argv.stdin) {
process.stdin.setEncoding(STDIN_ENCODING);
Expand Down
Loading