Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to add a new font to support Cyrillic? #9

Open
ciukstar opened this issue Sep 10, 2022 · 5 comments
Open

How to add a new font to support Cyrillic? #9

ciukstar opened this issue Sep 10, 2022 · 5 comments

Comments

@ciukstar
Copy link

ciukstar commented Sep 10, 2022

Hello.

As I understand, the standard fonts provided by "Graphics.PDF.Fonts.Standard.Font" do not provide support for Cyrillic.
So I followed the HPDF/Test/onepage.hs example without success. The generated .pdf does not display Cyrillic text.
I have tried several other fonts besides DroidSans: OpenSans, DejaVuSans, FreeSerif, TimesNewRoman, all without success.

Note that I used fontforge to convert .ttf to .pfb and .afm files as below:
$ fontforge -lang=ff -c 'Open($1); Generate($2); Close();' DroidSans.ttf DroidSans.pfb
Some of them had issues with underscores (i.e. "_") and numeric digits (like "6") in .amf files when parsed with HPDF.

Is there anything else besides the example given on HPDF/Test/onepage.hs?
Or maybe it needs to be done in a completely different way?

Env:

LTS Haskell 19.22 (ghc-9.0.2)
HPDF 1.6.0

Thanks.

@hsyl20
Copy link
Owner

hsyl20 commented Nov 18, 2022

Hi,
Sadly I don't know how the existing .afm files were generated, nor if the parser supports the full spec (is there one?).

Perhaps @pkamenarsky knows more about this topic as he made changed related to fonts in HPDF?

@unhammer
Copy link

unhammer commented Jun 20, 2023

Did you ever figure this out? I also notice I don't get subscripts, e.g. with Times_Roman, the string "C₂₁H₃₀O₅" prints as CHO

alpheccar#12 said to have patience back in 2016 🧐

@ciukstar
Copy link
Author

Did you ever figure this out? I also notice I don't get subscripts, e.g. with Times_Roman, the string "C₂₁H₃₀O₅" prints as CHO

No. I couldn't figure out. I have tried many different .pfb and .afm files (CTAN) but it seems that it doesn't support uniXXXXX tokens in .afm files and when those tokens are missing in .afm files it returns error *** Exception: head: empty list

There is a TODO.txt file which says that "The MAIN missing feature is the support for unicode and the bidirectionnal layout algorithm that will then be required".
I guess, it is just not implemented.

@unhammer
Copy link

I tried printing a file with ₂₁₃₀₅ to pdf from firefox and opening the pdf in fontforge, it showed e.g. ₂ being SUBSCRIPT TWO so that looked right, then I did File→Generate→PS1(binary) to generate a pfb+afm and loaded it into https://github.com/hsyl20/HPDF/blob/master/Test/onepage.hs (changing debugtext to ₂₁₃₀₅ and font paths to my generated pfb+afm), but I just got ☐☐☐☐☐'s. So it really does seem like there's Something missing with respect to unicode (though I don't understand what it would have to do with bidi)

@unhammer
Copy link

unhammer commented Jan 18, 2025

I grepped for 2080 (₀) in the .afm and found uni2080, so the afm which I extracted from the firefox pdf seems to call unicode 2080 uni2080. By adding this to glyphlist.txt:

uni2080;2080
uni2081;2081
uni2082;2082
uni2083;2083
uni2084;2084
uni2085;2085
uni2086;2086
uni2087;2087
uni2088;2088
uni2089;2089

and using the extracted .afm/.pfb I did actually get the subscripts to display.

https://github.com/hsyl20/HPDF/blob/master/Graphics/PDF/Fonts/Encoding.hs#L67-L74 seems to be the relevant code.

So what would be a good API for this? I can imagine some function which takes a text and creates a list like the above and makes an Encoding out of it, but one can't trust that uni2080 etc. are always used in .afm's, e.g. I tried putting ⊂ in a html and firefox inserted a different font where the .afm called it subset (which is a name that does exists in glyphlist.txt).

Ideally simply loading the font would give it the right encoding in its FontStructure. Using readType1Font pfb afmPath like I did ends up just using getEncoding AdobeStandardEncoding regardless. But maybe the actual encoding is somehow readable from pfb+afm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants