[jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Andreas Joseph Krogh (Jira)

     [ https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Bowditch reassigned FOP-2210:
-----------------------------------

    Assignee: Chris Bowditch
   

> [PATCH] Complex script IF to output missing glyphs
> --------------------------------------------------
>
>                 Key: FOP-2210
>                 URL: https://issues.apache.org/jira/browse/FOP-2210
>             Project: Fop
>          Issue Type: Bug
>            Reporter: simon steiner
>            Assignee: Chris Bowditch
>         Attachments: csspeedtrunk.patch, fop.xconf, test.fo
>
>
> fop test.fo -c fop.xconf -if application/pdf expected.if.xml
> fop -c fop.xconf -ifin expected.if.xml out.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Luis Bernardo

Glenn,

Can you give your opinion about the approach used by Simon? As I
mentioned before (in a private message), the IF -> PS/PDF route does not
work in your original CS patch (for the languages that CS targets) due
to the mapped sequences. Simon's approach works but requires keeping the
original sequences alongside the mapped ones. I think it is a good
approach but I would like to know if you have a better suggestion before
we apply the patch.

Thanks,
Luis

On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:

>       [ https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Chris Bowditch reassigned FOP-2210:
> -----------------------------------
>
>      Assignee: Chris Bowditch
>      
>> [PATCH] Complex script IF to output missing glyphs
>> --------------------------------------------------
>>
>>                  Key: FOP-2210
>>                  URL: https://issues.apache.org/jira/browse/FOP-2210
>>              Project: Fop
>>           Issue Type: Bug
>>             Reporter: simon steiner
>>             Assignee: Chris Bowditch
>>          Attachments: csspeedtrunk.patch, fop.xconf, test.fo
>>
>>
>> fop test.fo -c fop.xconf -if application/pdf expected.if.xml
>> fop -c fop.xconf -ifin expected.if.xml out.pdf
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Glenn Adams-2
I'm presently at W3C WG meetings this week, but I'll try to get on my schedule. I'm not sure what the IF->PS/PDF problem is, since the IF->PDF path is clearly working from my tests.


On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo <[hidden email]> wrote:

Glenn,

Can you give your opinion about the approach used by Simon? As I mentioned before (in a private message), the IF -> PS/PDF route does not work in your original CS patch (for the languages that CS targets) due to the mapped sequences. Simon's approach works but requires keeping the original sequences alongside the mapped ones. I think it is a good approach but I would like to know if you have a better suggestion before we apply the patch.

Thanks,
Luis


On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
      [ https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Bowditch reassigned FOP-2210:
-----------------------------------

     Assignee: Chris Bowditch
     
[PATCH] Complex script IF to output missing glyphs
--------------------------------------------------

                 Key: FOP-2210
                 URL: https://issues.apache.org/jira/browse/FOP-2210
             Project: Fop
          Issue Type: Bug
            Reporter: simon steiner
            Assignee: Chris Bowditch
         Attachments: csspeedtrunk.patch, fop.xconf, test.fo


fop test.fo -c fop.xconf -if application/pdf expected.if.xml
fop -c fop.xconf -ifin expected.if.xml out.pdf
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Glenn Adams-2
Ah, I reread your earlier (private) message. I see the problem has to do with the use of synthesized PUA mappings. Here, the problem really is that the font should always have a CMAP entry that maps to every glyph that can be produced by the GSUB process. However, not all fonts do this, so in the case in point, we have to synthesize some mapping, from which we have to turn to PUA assignments. This works when we generate PDF since we generate a subset font that contains the synthesized mappings. However, I can see that if this is going to IF instead of PDF/PS, then we need to find a way to recreate those synthesized mappings.

I think this information is really font-specific, and should not be tied to specific text nodes though. So if Simon's fix uses text nodes, then that is probably not the best approach.


On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <[hidden email]> wrote:
I'm presently at W3C WG meetings this week, but I'll try to get on my schedule. I'm not sure what the IF->PS/PDF problem is, since the IF->PDF path is clearly working from my tests.


On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo <[hidden email]> wrote:

Glenn,

Can you give your opinion about the approach used by Simon? As I mentioned before (in a private message), the IF -> PS/PDF route does not work in your original CS patch (for the languages that CS targets) due to the mapped sequences. Simon's approach works but requires keeping the original sequences alongside the mapped ones. I think it is a good approach but I would like to know if you have a better suggestion before we apply the patch.

Thanks,
Luis


On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
      [ https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Bowditch reassigned FOP-2210:
-----------------------------------

     Assignee: Chris Bowditch
     
[PATCH] Complex script IF to output missing glyphs
--------------------------------------------------

                 Key: FOP-2210
                 URL: https://issues.apache.org/jira/browse/FOP-2210
             Project: Fop
          Issue Type: Bug
            Reporter: simon steiner
            Assignee: Chris Bowditch
         Attachments: csspeedtrunk.patch, fop.xconf, test.fo


fop test.fo -c fop.xconf -if application/pdf expected.if.xml
fop -c fop.xconf -ifin expected.if.xml out.pdf
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira



Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Luis Bernardo

With the approach implemented by Simon what gets written to the IF file is the original sequence, not the mapped sequence. Then when generating PDF from IF the same code that would generate the synthesized mappings when generating PDF straight from FO is called to recreate the mappings. So I don't think we can say there is information about the mappings in the text nodes.

On 4/23/13 5:50 AM, Glenn Adams wrote:
Ah, I reread your earlier (private) message. I see the problem has to do with the use of synthesized PUA mappings. Here, the problem really is that the font should always have a CMAP entry that maps to every glyph that can be produced by the GSUB process. However, not all fonts do this, so in the case in point, we have to synthesize some mapping, from which we have to turn to PUA assignments. This works when we generate PDF since we generate a subset font that contains the synthesized mappings. However, I can see that if this is going to IF instead of PDF/PS, then we need to find a way to recreate those synthesized mappings.

I think this information is really font-specific, and should not be tied to specific text nodes though. So if Simon's fix uses text nodes, then that is probably not the best approach.


On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <[hidden email]> wrote:
I'm presently at W3C WG meetings this week, but I'll try to get on my schedule. I'm not sure what the IF->PS/PDF problem is, since the IF->PDF path is clearly working from my tests.


On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo <[hidden email]> wrote:

Glenn,

Can you give your opinion about the approach used by Simon? As I mentioned before (in a private message), the IF -> PS/PDF route does not work in your original CS patch (for the languages that CS targets) due to the mapped sequences. Simon's approach works but requires keeping the original sequences alongside the mapped ones. I think it is a good approach but I would like to know if you have a better suggestion before we apply the patch.

Thanks,
Luis


On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
      [ https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Bowditch reassigned FOP-2210:
-----------------------------------

     Assignee: Chris Bowditch
     
[PATCH] Complex script IF to output missing glyphs
--------------------------------------------------

                 Key: FOP-2210
                 URL: https://issues.apache.org/jira/browse/FOP-2210
             Project: Fop
          Issue Type: Bug
            Reporter: simon steiner
            Assignee: Chris Bowditch
         Attachments: csspeedtrunk.patch, fop.xconf, test.fo


fop test.fo -c fop.xconf -if application/pdf expected.if.xml
fop -c fop.xconf -ifin expected.if.xml out.pdf
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira




Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Glenn Adams-2
I don't like this. It negates any additional processing that may have occurred, such as letter spacing. It requires the IF to repeat part of the layout process. Bad idea.


On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <[hidden email]> wrote:

With the approach implemented by Simon what gets written to the IF file is the original sequence, not the mapped sequence. Then when generating PDF from IF the same code that would generate the synthesized mappings when generating PDF straight from FO is called to recreate the mappings. So I don't think we can say there is information about the mappings in the text nodes.


On 4/23/13 5:50 AM, Glenn Adams wrote:
Ah, I reread your earlier (private) message. I see the problem has to do with the use of synthesized PUA mappings. Here, the problem really is that the font should always have a CMAP entry that maps to every glyph that can be produced by the GSUB process. However, not all fonts do this, so in the case in point, we have to synthesize some mapping, from which we have to turn to PUA assignments. This works when we generate PDF since we generate a subset font that contains the synthesized mappings. However, I can see that if this is going to IF instead of PDF/PS, then we need to find a way to recreate those synthesized mappings.

I think this information is really font-specific, and should not be tied to specific text nodes though. So if Simon's fix uses text nodes, then that is probably not the best approach.


On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <[hidden email]> wrote:
I'm presently at W3C WG meetings this week, but I'll try to get on my schedule. I'm not sure what the IF->PS/PDF problem is, since the IF->PDF path is clearly working from my tests.


On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo <[hidden email]> wrote:

Glenn,

Can you give your opinion about the approach used by Simon? As I mentioned before (in a private message), the IF -> PS/PDF route does not work in your original CS patch (for the languages that CS targets) due to the mapped sequences. Simon's approach works but requires keeping the original sequences alongside the mapped ones. I think it is a good approach but I would like to know if you have a better suggestion before we apply the patch.

Thanks,
Luis


On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
      [ https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Bowditch reassigned FOP-2210:
-----------------------------------

     Assignee: Chris Bowditch
     
[PATCH] Complex script IF to output missing glyphs
--------------------------------------------------

                 Key: FOP-2210
                 URL: https://issues.apache.org/jira/browse/FOP-2210
             Project: Fop
          Issue Type: Bug
            Reporter: simon steiner
            Assignee: Chris Bowditch
         Attachments: csspeedtrunk.patch, fop.xconf, test.fo


fop test.fo -c fop.xconf -if application/pdf expected.if.xml
fop -c fop.xconf -ifin expected.if.xml out.pdf
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira





Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Chris Bowditch
Hi Glenn,

Can you suggest an alternative approach please?

Thanks,

Chris

On 24/04/2013 02:41, Glenn Adams wrote:

> I don't like this. It negates any additional processing that may have
> occurred, such as letter spacing. It requires the IF to repeat part of
> the layout process. Bad idea.
>
>
> On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>
>     With the approach implemented by Simon what gets written to the IF
>     file is the original sequence, not the mapped sequence. Then when
>     generating PDF from IF the same code that would generate the
>     synthesized mappings when generating PDF straight from FO is
>     called to recreate the mappings. So I don't think we can say there
>     is information about the mappings in the text nodes.
>
>
>     On 4/23/13 5:50 AM, Glenn Adams wrote:
>>     Ah, I reread your earlier (private) message. I see the problem
>>     has to do with the use of synthesized PUA mappings. Here, the
>>     problem really is that the font should always have a CMAP entry
>>     that maps to every glyph that can be produced by the GSUB
>>     process. However, not all fonts do this, so in the case in point,
>>     we have to synthesize some mapping, from which we have to turn to
>>     PUA assignments. This works when we generate PDF since we
>>     generate a subset font that contains the synthesized mappings.
>>     However, I can see that if this is going to IF instead of PDF/PS,
>>     then we need to find a way to recreate those synthesized mappings.
>>
>>     I think this information is really font-specific, and should not
>>     be tied to specific text nodes though. So if Simon's fix uses
>>     text nodes, then that is probably not the best approach.
>>
>>
>>     On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <[hidden email]
>>     <mailto:[hidden email]>> wrote:
>>
>>         I'm presently at W3C WG meetings this week, but I'll try to
>>         get on my schedule. I'm not sure what the IF->PS/PDF problem
>>         is, since the IF->PDF path is clearly working from my tests.
>>
>>
>>         On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
>>         <[hidden email] <mailto:[hidden email]>> wrote:
>>
>>
>>             Glenn,
>>
>>             Can you give your opinion about the approach used by
>>             Simon? As I mentioned before (in a private message), the
>>             IF -> PS/PDF route does not work in your original CS
>>             patch (for the languages that CS targets) due to the
>>             mapped sequences. Simon's approach works but requires
>>             keeping the original sequences alongside the mapped ones.
>>             I think it is a good approach but I would like to know if
>>             you have a better suggestion before we apply the patch.
>>
>>             Thanks,
>>             Luis
>>
>>
>>             On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
>>
>>                 [
>>                 https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>                 ]
>>
>>                 Chris Bowditch reassigned FOP-2210:
>>                 -----------------------------------
>>
>>                      Assignee: Chris Bowditch
>>
>>                     [PATCH] Complex script IF to output missing glyphs
>>                     --------------------------------------------------
>>
>>                                      Key: FOP-2210
>>                                      URL:
>>                     https://issues.apache.org/jira/browse/FOP-2210
>>                                  Project: Fop
>>                               Issue Type: Bug
>>                                 Reporter: simon steiner
>>                                 Assignee: Chris Bowditch
>>                              Attachments: csspeedtrunk.patch,
>>                     fop.xconf, test.fo <http://test.fo>
>>
>>
>>                     fop test.fo <http://test.fo> -c fop.xconf -if
>>                     application/pdf expected.if.xml
>>                     fop -c fop.xconf -ifin expected.if.xml out.pdf
>>
>>                 --
>>                 This message is automatically generated by JIRA.
>>                 If you think it was sent incorrectly, please contact
>>                 your JIRA administrators
>>                 For more information on JIRA, see:
>>                 http://www.atlassian.com/software/jira
>>
>>
>>
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Glenn Adams-2
Sure. One way to do this would be to add child elements to the <font/> element in IF output as follows:

<font family="Lateef" style="normal" ...>
  <pua code="0xE000" gid="139"/>
  <pua code="0xE001" gid="481"/>
  <pua code="0xE002" gid="219"/>
</font>

where these PUA mappings are collected by iterating over the characters of TextAreas governed by the <font/> element. These characters might be iterated upon invoking TextArea.add{Word,Space}, and collecting this info in text areas.

Alternatively, MultiByteFont.getUsedGlyphs() could be used to (1) determine which glyph codes were referenced by the document, (2) given these used codes, iterate of the the CMAP mappings to find which PUA codes were generated for those glyph codes, then (3) output the <pua/> elements (above) as required.

Finally, when reading an IF file, these <pua/> elements would be used to augment the font's CMAP (keeping in mind that when reading the font, MultiByteFont.createPrivateUseMappings() may have already been called, and thus the mappings in <pua/> elements may need to be replaced or merged.

I can imagine various other optimizations on the above theme to make this readily workable.



On Wed, Apr 24, 2013 at 3:18 AM, Chris Bowditch <[hidden email]> wrote:
Hi Glenn,

Can you suggest an alternative approach please?

Thanks,

Chris


On 24/04/2013 02:41, Glenn Adams wrote:
I don't like this. It negates any additional processing that may have occurred, such as letter spacing. It requires the IF to repeat part of the layout process. Bad idea.


On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <[hidden email] <mailto:[hidden email]>> wrote:


    With the approach implemented by Simon what gets written to the IF
    file is the original sequence, not the mapped sequence. Then when
    generating PDF from IF the same code that would generate the
    synthesized mappings when generating PDF straight from FO is
    called to recreate the mappings. So I don't think we can say there
    is information about the mappings in the text nodes.


    On 4/23/13 5:50 AM, Glenn Adams wrote:
    Ah, I reread your earlier (private) message. I see the problem
    has to do with the use of synthesized PUA mappings. Here, the
    problem really is that the font should always have a CMAP entry
    that maps to every glyph that can be produced by the GSUB
    process. However, not all fonts do this, so in the case in point,
    we have to synthesize some mapping, from which we have to turn to
    PUA assignments. This works when we generate PDF since we
    generate a subset font that contains the synthesized mappings.
    However, I can see that if this is going to IF instead of PDF/PS,
    then we need to find a way to recreate those synthesized mappings.

    I think this information is really font-specific, and should not
    be tied to specific text nodes though. So if Simon's fix uses
    text nodes, then that is probably not the best approach.


    On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <[hidden email]
    <mailto:[hidden email]>> wrote:

        I'm presently at W3C WG meetings this week, but I'll try to
        get on my schedule. I'm not sure what the IF->PS/PDF problem
        is, since the IF->PDF path is clearly working from my tests.


        On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
        <[hidden email] <mailto:[hidden email]>> wrote:


            Glenn,

            Can you give your opinion about the approach used by
            Simon? As I mentioned before (in a private message), the
            IF -> PS/PDF route does not work in your original CS
            patch (for the languages that CS targets) due to the
            mapped sequences. Simon's approach works but requires
            keeping the original sequences alongside the mapped ones.
            I think it is a good approach but I would like to know if
            you have a better suggestion before we apply the patch.

            Thanks,
            Luis


            On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:

                [
                https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
                ]

                Chris Bowditch reassigned FOP-2210:
                -----------------------------------

                     Assignee: Chris Bowditch

                    [PATCH] Complex script IF to output missing glyphs
                    --------------------------------------------------

                                     Key: FOP-2210
                                     URL:
                    https://issues.apache.org/jira/browse/FOP-2210
                                 Project: Fop
                              Issue Type: Bug
                                Reporter: simon steiner
                                Assignee: Chris Bowditch
                             Attachments: csspeedtrunk.patch,
                    fop.xconf, test.fo <http://test.fo>


                    fop test.fo <http://test.fo> -c fop.xconf -if

                    application/pdf expected.if.xml
                    fop -c fop.xconf -ifin expected.if.xml out.pdf

                --
                This message is automatically generated by JIRA.
                If you think it was sent incorrectly, please contact
                your JIRA administrators
                For more information on JIRA, see:
                http://www.atlassian.com/software/jira








Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Luis Bernardo

These are good suggestions. I am fully aware of the shortcomings that you pointed out, but the only other option seemed to be to codify the mappings in IF, similar to your first suggestion. However that would mean changing IF which is not something we are keen to do since that impacts applications that rely on the current format.

Are you saying that with your second approach there is no need to change IF?

On 4/24/13 7:38 PM, Glenn Adams wrote:
Sure. One way to do this would be to add child elements to the <font/> element in IF output as follows:

<font family="Lateef" style="normal" ...>
  <pua code="0xE000" gid="139"/>
  <pua code="0xE001" gid="481"/>
  <pua code="0xE002" gid="219"/>
</font>

where these PUA mappings are collected by iterating over the characters of TextAreas governed by the <font/> element. These characters might be iterated upon invoking TextArea.add{Word,Space}, and collecting this info in text areas.

Alternatively, MultiByteFont.getUsedGlyphs() could be used to (1) determine which glyph codes were referenced by the document, (2) given these used codes, iterate of the the CMAP mappings to find which PUA codes were generated for those glyph codes, then (3) output the <pua/> elements (above) as required.

Finally, when reading an IF file, these <pua/> elements would be used to augment the font's CMAP (keeping in mind that when reading the font, MultiByteFont.createPrivateUseMappings() may have already been called, and thus the mappings in <pua/> elements may need to be replaced or merged.

I can imagine various other optimizations on the above theme to make this readily workable.



On Wed, Apr 24, 2013 at 3:18 AM, Chris Bowditch <[hidden email]> wrote:
Hi Glenn,

Can you suggest an alternative approach please?

Thanks,

Chris


On 24/04/2013 02:41, Glenn Adams wrote:
I don't like this. It negates any additional processing that may have occurred, such as letter spacing. It requires the IF to repeat part of the layout process. Bad idea.


On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <[hidden email] <mailto:[hidden email]>> wrote:


    With the approach implemented by Simon what gets written to the IF
    file is the original sequence, not the mapped sequence. Then when
    generating PDF from IF the same code that would generate the
    synthesized mappings when generating PDF straight from FO is
    called to recreate the mappings. So I don't think we can say there
    is information about the mappings in the text nodes.


    On 4/23/13 5:50 AM, Glenn Adams wrote:
    Ah, I reread your earlier (private) message. I see the problem
    has to do with the use of synthesized PUA mappings. Here, the
    problem really is that the font should always have a CMAP entry
    that maps to every glyph that can be produced by the GSUB
    process. However, not all fonts do this, so in the case in point,
    we have to synthesize some mapping, from which we have to turn to
    PUA assignments. This works when we generate PDF since we
    generate a subset font that contains the synthesized mappings.
    However, I can see that if this is going to IF instead of PDF/PS,
    then we need to find a way to recreate those synthesized mappings.

    I think this information is really font-specific, and should not
    be tied to specific text nodes though. So if Simon's fix uses
    text nodes, then that is probably not the best approach.


    On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <[hidden email]
    <mailto:[hidden email]>> wrote:

        I'm presently at W3C WG meetings this week, but I'll try to
        get on my schedule. I'm not sure what the IF->PS/PDF problem
        is, since the IF->PDF path is clearly working from my tests.


        On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
        <[hidden email] <mailto:[hidden email]>> wrote:


            Glenn,

            Can you give your opinion about the approach used by
            Simon? As I mentioned before (in a private message), the
            IF -> PS/PDF route does not work in your original CS
            patch (for the languages that CS targets) due to the
            mapped sequences. Simon's approach works but requires
            keeping the original sequences alongside the mapped ones.
            I think it is a good approach but I would like to know if
            you have a better suggestion before we apply the patch.

            Thanks,
            Luis


            On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:

                [
                https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
                ]

                Chris Bowditch reassigned FOP-2210:
                -----------------------------------

                     Assignee: Chris Bowditch

                    [PATCH] Complex script IF to output missing glyphs
                    --------------------------------------------------

                                     Key: FOP-2210
                                     URL:
                    https://issues.apache.org/jira/browse/FOP-2210
                                 Project: Fop
                              Issue Type: Bug
                                Reporter: simon steiner
                                Assignee: Chris Bowditch
                             Attachments: csspeedtrunk.patch,
                    fop.xconf, test.fo <http://test.fo>


                    fop test.fo <http://test.fo> -c fop.xconf -if

                    application/pdf expected.if.xml
                    fop -c fop.xconf -ifin expected.if.xml out.pdf

                --
                This message is automatically generated by JIRA.
                If you think it was sent incorrectly, please contact
                your JIRA administrators
                For more information on JIRA, see:
                http://www.atlassian.com/software/jira









Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Glenn Adams-2
I see no option but to modify IF. We modified IF for 1.1 in the first place.  We have recently made quite a number of backward incompatible changes to the FOP public APIs. I expect the next release will need to bump the major version to 2 for FOP due to these changes, so there is little risk in making a change in IF. If there are other, useful changes to IF that have been postponed, then perhaps they should be reconsidered now as well.


On Wed, Apr 24, 2013 at 3:26 PM, Luis Bernardo <[hidden email]> wrote:

These are good suggestions. I am fully aware of the shortcomings that you pointed out, but the only other option seemed to be to codify the mappings in IF, similar to your first suggestion. However that would mean changing IF which is not something we are keen to do since that impacts applications that rely on the current format.

Are you saying that with your second approach there is no need to change IF?


On 4/24/13 7:38 PM, Glenn Adams wrote:
Sure. One way to do this would be to add child elements to the <font/> element in IF output as follows:

<font family="Lateef" style="normal" ...>
  <pua code="0xE000" gid="139"/>
  <pua code="0xE001" gid="481"/>
  <pua code="0xE002" gid="219"/>
</font>

where these PUA mappings are collected by iterating over the characters of TextAreas governed by the <font/> element. These characters might be iterated upon invoking TextArea.add{Word,Space}, and collecting this info in text areas.

Alternatively, MultiByteFont.getUsedGlyphs() could be used to (1) determine which glyph codes were referenced by the document, (2) given these used codes, iterate of the the CMAP mappings to find which PUA codes were generated for those glyph codes, then (3) output the <pua/> elements (above) as required.

Finally, when reading an IF file, these <pua/> elements would be used to augment the font's CMAP (keeping in mind that when reading the font, MultiByteFont.createPrivateUseMappings() may have already been called, and thus the mappings in <pua/> elements may need to be replaced or merged.

I can imagine various other optimizations on the above theme to make this readily workable.



On Wed, Apr 24, 2013 at 3:18 AM, Chris Bowditch <[hidden email]> wrote:
Hi Glenn,

Can you suggest an alternative approach please?

Thanks,

Chris


On 24/04/2013 02:41, Glenn Adams wrote:
I don't like this. It negates any additional processing that may have occurred, such as letter spacing. It requires the IF to repeat part of the layout process. Bad idea.


On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <[hidden email] <mailto:[hidden email]>> wrote:


    With the approach implemented by Simon what gets written to the IF
    file is the original sequence, not the mapped sequence. Then when
    generating PDF from IF the same code that would generate the
    synthesized mappings when generating PDF straight from FO is
    called to recreate the mappings. So I don't think we can say there
    is information about the mappings in the text nodes.


    On 4/23/13 5:50 AM, Glenn Adams wrote:
    Ah, I reread your earlier (private) message. I see the problem
    has to do with the use of synthesized PUA mappings. Here, the
    problem really is that the font should always have a CMAP entry
    that maps to every glyph that can be produced by the GSUB
    process. However, not all fonts do this, so in the case in point,
    we have to synthesize some mapping, from which we have to turn to
    PUA assignments. This works when we generate PDF since we
    generate a subset font that contains the synthesized mappings.
    However, I can see that if this is going to IF instead of PDF/PS,
    then we need to find a way to recreate those synthesized mappings.

    I think this information is really font-specific, and should not
    be tied to specific text nodes though. So if Simon's fix uses
    text nodes, then that is probably not the best approach.


    On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <[hidden email]
    <mailto:[hidden email]>> wrote:

        I'm presently at W3C WG meetings this week, but I'll try to
        get on my schedule. I'm not sure what the IF->PS/PDF problem
        is, since the IF->PDF path is clearly working from my tests.


        On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
        <[hidden email] <mailto:[hidden email]>> wrote:


            Glenn,

            Can you give your opinion about the approach used by
            Simon? As I mentioned before (in a private message), the
            IF -> PS/PDF route does not work in your original CS
            patch (for the languages that CS targets) due to the
            mapped sequences. Simon's approach works but requires
            keeping the original sequences alongside the mapped ones.
            I think it is a good approach but I would like to know if
            you have a better suggestion before we apply the patch.

            Thanks,
            Luis


            On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:

                [
                https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
                ]

                Chris Bowditch reassigned FOP-2210:
                -----------------------------------

                     Assignee: Chris Bowditch

                    [PATCH] Complex script IF to output missing glyphs
                    --------------------------------------------------

                                     Key: FOP-2210
                                     URL:
                    https://issues.apache.org/jira/browse/FOP-2210
                                 Project: Fop
                              Issue Type: Bug
                                Reporter: simon steiner
                                Assignee: Chris Bowditch
                             Attachments: csspeedtrunk.patch,
                    fop.xconf, test.fo <http://test.fo>


                    fop test.fo <http://test.fo> -c fop.xconf -if

                    application/pdf expected.if.xml
                    fop -c fop.xconf -ifin expected.if.xml out.pdf

                --
                This message is automatically generated by JIRA.
                If you think it was sent incorrectly, please contact
                your JIRA administrators
                For more information on JIRA, see:
                http://www.atlassian.com/software/jira










Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Alexios Giotis
For our use cases, it would be much better to add new child elements to IF or do other similar extensions, that having to repeat part of the costly layout process. Besides repeating, the FO -> IF is easily executed by multiple threads, while the IF->PDF can not be parallelised (without big changes).


On 25 Apr 2013, at 01:52, Glenn Adams <[hidden email]> wrote:

> I see no option but to modify IF. We modified IF for 1.1 in the first place.  We have recently made quite a number of backward incompatible changes to the FOP public APIs. I expect the next release will need to bump the major version to 2 for FOP due to these changes, so there is little risk in making a change in IF. If there are other, useful changes to IF that have been postponed, then perhaps they should be reconsidered now as well.
>
>
> On Wed, Apr 24, 2013 at 3:26 PM, Luis Bernardo <[hidden email]> wrote:
>
> These are good suggestions. I am fully aware of the shortcomings that you pointed out, but the only other option seemed to be to codify the mappings in IF, similar to your first suggestion. However that would mean changing IF which is not something we are keen to do since that impacts applications that rely on the current format.
>
> Are you saying that with your second approach there is no need to change IF?
>
>
> On 4/24/13 7:38 PM, Glenn Adams wrote:
>> Sure. One way to do this would be to add child elements to the <font/> element in IF output as follows:
>>
>> <font family="Lateef" style="normal" ...>
>>   <pua code="0xE000" gid="139"/>
>>   <pua code="0xE001" gid="481"/>
>>   <pua code="0xE002" gid="219"/>
>> </font>
>>
>> where these PUA mappings are collected by iterating over the characters of TextAreas governed by the <font/> element. These characters might be iterated upon invoking TextArea.add{Word,Space}, and collecting this info in text areas.
>>
>> Alternatively, MultiByteFont.getUsedGlyphs() could be used to (1) determine which glyph codes were referenced by the document, (2) given these used codes, iterate of the the CMAP mappings to find which PUA codes were generated for those glyph codes, then (3) output the <pua/> elements (above) as required.
>>
>> Finally, when reading an IF file, these <pua/> elements would be used to augment the font's CMAP (keeping in mind that when reading the font, MultiByteFont.createPrivateUseMappings() may have already been called, and thus the mappings in <pua/> elements may need to be replaced or merged.
>>
>> I can imagine various other optimizations on the above theme to make this readily workable.
>>
>>
>>
>> On Wed, Apr 24, 2013 at 3:18 AM, Chris Bowditch <[hidden email]> wrote:
>> Hi Glenn,
>>
>> Can you suggest an alternative approach please?
>>
>> Thanks,
>>
>> Chris
>>
>>
>> On 24/04/2013 02:41, Glenn Adams wrote:
>> I don't like this. It negates any additional processing that may have occurred, such as letter spacing. It requires the IF to repeat part of the layout process. Bad idea.
>>
>>
>> On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <[hidden email] <mailto:[hidden email]>> wrote:
>>
>>
>>     With the approach implemented by Simon what gets written to the IF
>>     file is the original sequence, not the mapped sequence. Then when
>>     generating PDF from IF the same code that would generate the
>>     synthesized mappings when generating PDF straight from FO is
>>     called to recreate the mappings. So I don't think we can say there
>>     is information about the mappings in the text nodes.
>>
>>
>>     On 4/23/13 5:50 AM, Glenn Adams wrote:
>>     Ah, I reread your earlier (private) message. I see the problem
>>     has to do with the use of synthesized PUA mappings. Here, the
>>     problem really is that the font should always have a CMAP entry
>>     that maps to every glyph that can be produced by the GSUB
>>     process. However, not all fonts do this, so in the case in point,
>>     we have to synthesize some mapping, from which we have to turn to
>>     PUA assignments. This works when we generate PDF since we
>>     generate a subset font that contains the synthesized mappings.
>>     However, I can see that if this is going to IF instead of PDF/PS,
>>     then we need to find a way to recreate those synthesized mappings.
>>
>>     I think this information is really font-specific, and should not
>>     be tied to specific text nodes though. So if Simon's fix uses
>>     text nodes, then that is probably not the best approach.
>>
>>
>>     On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <[hidden email]
>>     <mailto:[hidden email]>> wrote:
>>
>>         I'm presently at W3C WG meetings this week, but I'll try to
>>         get on my schedule. I'm not sure what the IF->PS/PDF problem
>>         is, since the IF->PDF path is clearly working from my tests.
>>
>>
>>         On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
>>         <[hidden email] <mailto:[hidden email]>> wrote:
>>
>>
>>             Glenn,
>>
>>             Can you give your opinion about the approach used by
>>             Simon? As I mentioned before (in a private message), the
>>             IF -> PS/PDF route does not work in your original CS
>>             patch (for the languages that CS targets) due to the
>>             mapped sequences. Simon's approach works but requires
>>             keeping the original sequences alongside the mapped ones.
>>             I think it is a good approach but I would like to know if
>>             you have a better suggestion before we apply the patch.
>>
>>             Thanks,
>>             Luis
>>
>>
>>             On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
>>
>>                 [
>>                 https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>                 ]
>>
>>                 Chris Bowditch reassigned FOP-2210:
>>                 -----------------------------------
>>
>>                      Assignee: Chris Bowditch
>>
>>                     [PATCH] Complex script IF to output missing glyphs
>>                     --------------------------------------------------
>>
>>                                      Key: FOP-2210
>>                                      URL:
>>                     https://issues.apache.org/jira/browse/FOP-2210
>>                                  Project: Fop
>>                               Issue Type: Bug
>>                                 Reporter: simon steiner
>>                                 Assignee: Chris Bowditch
>>                              Attachments: csspeedtrunk.patch,
>>                     fop.xconf, test.fo <http://test.fo>
>>
>>
>>                     fop test.fo <http://test.fo> -c fop.xconf -if
>>
>>                     application/pdf expected.if.xml
>>                     fop -c fop.xconf -ifin expected.if.xml out.pdf
>>
>>                 --
>>                 This message is automatically generated by JIRA.
>>                 If you think it was sent incorrectly, please contact
>>                 your JIRA administrators
>>                 For more information on JIRA, see:
>>                 http://www.atlassian.com/software/jira
>>
>>
>>
>>
>>
>>
>>
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Chris Bowditch
In reply to this post by Glenn Adams-2
Hi Glenn, Luis,

It's true that modifying IF can cause us problems for older programs
designed to modify it. However, since none of those older applications
work with CS, then I'm confident that the additional elements proposed
by Glenn shouldn't be a problem.

Thanks,

Chris

On 24/04/2013 23:52, Glenn Adams wrote:

> I see no option but to modify IF. We modified IF for 1.1 in the first
> place.  We have recently made quite a number of backward incompatible
> changes to the FOP public APIs. I expect the next release will need to
> bump the major version to 2 for FOP due to these changes, so there is
> little risk in making a change in IF. If there are other, useful
> changes to IF that have been postponed, then perhaps they should be
> reconsidered now as well.
>
>
> On Wed, Apr 24, 2013 at 3:26 PM, Luis Bernardo <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>
>     These are good suggestions. I am fully aware of the shortcomings
>     that you pointed out, but the only other option seemed to be to
>     codify the mappings in IF, similar to your first suggestion.
>     However that would mean changing IF which is not something we are
>     keen to do since that impacts applications that rely on the
>     current format.
>
>     Are you saying that with your second approach there is no need to
>     change IF?
>
>
>     On 4/24/13 7:38 PM, Glenn Adams wrote:
>>     Sure. One way to do this would be to add child elements to the
>>     <font/> element in IF output as follows:
>>
>>     <font family="Lateef" style="normal" ...>
>>       <pua code="0xE000" gid="139"/>
>>       <pua code="0xE001" gid="481"/>
>>       <pua code="0xE002" gid="219"/>
>>     </font>
>>
>>     where these PUA mappings are collected by iterating over the
>>     characters of TextAreas governed by the <font/> element. These
>>     characters might be iterated upon invoking
>>     TextArea.add{Word,Space}, and collecting this info in text areas.
>>
>>     Alternatively, MultiByteFont.getUsedGlyphs() could be used to (1)
>>     determine which glyph codes were referenced by the document, (2)
>>     given these used codes, iterate of the the CMAP mappings to find
>>     which PUA codes were generated for those glyph codes, then (3)
>>     output the <pua/> elements (above) as required.
>>
>>     Finally, when reading an IF file, these <pua/> elements would be
>>     used to augment the font's CMAP (keeping in mind that when
>>     reading the font, MultiByteFont.createPrivateUseMappings() may
>>     have already been called, and thus the mappings in <pua/>
>>     elements may need to be replaced or merged.
>>
>>     I can imagine various other optimizations on the above theme to
>>     make this readily workable.
>>
>>
>>
>>     On Wed, Apr 24, 2013 at 3:18 AM, Chris Bowditch
>>     <[hidden email] <mailto:[hidden email]>>
>>     wrote:
>>
>>         Hi Glenn,
>>
>>         Can you suggest an alternative approach please?
>>
>>         Thanks,
>>
>>         Chris
>>
>>
>>         On 24/04/2013 02:41, Glenn Adams wrote:
>>
>>             I don't like this. It negates any additional processing
>>             that may have occurred, such as letter spacing. It
>>             requires the IF to repeat part of the layout process. Bad
>>             idea.
>>
>>
>>             On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo
>>             <[hidden email] <mailto:[hidden email]>
>>             <mailto:[hidden email]
>>             <mailto:[hidden email]>>> wrote:
>>
>>
>>                 With the approach implemented by Simon what gets
>>             written to the IF
>>                 file is the original sequence, not the mapped
>>             sequence. Then when
>>                 generating PDF from IF the same code that would
>>             generate the
>>                 synthesized mappings when generating PDF straight
>>             from FO is
>>                 called to recreate the mappings. So I don't think we
>>             can say there
>>                 is information about the mappings in the text nodes.
>>
>>
>>                 On 4/23/13 5:50 AM, Glenn Adams wrote:
>>
>>                     Ah, I reread your earlier (private) message. I
>>                 see the problem
>>                     has to do with the use of synthesized PUA
>>                 mappings. Here, the
>>                     problem really is that the font should always
>>                 have a CMAP entry
>>                     that maps to every glyph that can be produced by
>>                 the GSUB
>>                     process. However, not all fonts do this, so in
>>                 the case in point,
>>                     we have to synthesize some mapping, from which we
>>                 have to turn to
>>                     PUA assignments. This works when we generate PDF
>>                 since we
>>                     generate a subset font that contains the
>>                 synthesized mappings.
>>                     However, I can see that if this is going to IF
>>                 instead of PDF/PS,
>>                     then we need to find a way to recreate those
>>                 synthesized mappings.
>>
>>                     I think this information is really font-specific,
>>                 and should not
>>                     be tied to specific text nodes though. So if
>>                 Simon's fix uses
>>                     text nodes, then that is probably not the best
>>                 approach.
>>
>>
>>                     On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams
>>                 <[hidden email] <mailto:[hidden email]>
>>                     <mailto:[hidden email]
>>                 <mailto:[hidden email]>>> wrote:
>>
>>                         I'm presently at W3C WG meetings this week,
>>                 but I'll try to
>>                         get on my schedule. I'm not sure what the
>>                 IF->PS/PDF problem
>>                         is, since the IF->PDF path is clearly working
>>                 from my tests.
>>
>>
>>                         On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
>>                         <[hidden email]
>>                 <mailto:[hidden email]>
>>                 <mailto:[hidden email]
>>                 <mailto:[hidden email]>>> wrote:
>>
>>
>>                             Glenn,
>>
>>                             Can you give your opinion about the
>>                 approach used by
>>                             Simon? As I mentioned before (in a
>>                 private message), the
>>                             IF -> PS/PDF route does not work in your
>>                 original CS
>>                             patch (for the languages that CS targets)
>>                 due to the
>>                             mapped sequences. Simon's approach works
>>                 but requires
>>                             keeping the original sequences alongside
>>                 the mapped ones.
>>                             I think it is a good approach but I would
>>                 like to know if
>>                             you have a better suggestion before we
>>                 apply the patch.
>>
>>                             Thanks,
>>                             Luis
>>
>>
>>                             On 4/22/13 3:23 PM, Chris Bowditch (JIRA)
>>                 wrote:
>>
>>                                 [
>>                 https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>                                 ]
>>
>>                                 Chris Bowditch reassigned FOP-2210:
>>                 -----------------------------------
>>
>>                                      Assignee: Chris Bowditch
>>
>>                                     [PATCH] Complex script IF to
>>                 output missing glyphs
>>                 --------------------------------------------------
>>
>>                  Key: FOP-2210
>>                  URL:
>>                 https://issues.apache.org/jira/browse/FOP-2210
>>                  Project: Fop
>>                                               Issue Type: Bug
>>                 Reporter: simon steiner
>>                 Assignee: Chris Bowditch
>>                  Attachments: csspeedtrunk.patch,
>>                                     fop.xconf, test.fo
>>                 <http://test.fo> <http://test.fo>
>>
>>
>>                                     fop test.fo <http://test.fo>
>>                 <http://test.fo> -c fop.xconf -if
>>
>>                                     application/pdf expected.if.xml
>>                                     fop -c fop.xconf -ifin
>>                 expected.if.xml out.pdf
>>
>>                                 --
>>                                 This message is automatically
>>                 generated by JIRA.
>>                                 If you think it was sent incorrectly,
>>                 please contact
>>                                 your JIRA administrators
>>                                 For more information on JIRA, see:
>>                 http://www.atlassian.com/software/jira
>>
>>
>>
>>
>>
>>
>>
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Vincent Hennebert-2
In reply to this post by Alexios Giotis
On 25/04/13 10:35, Alexios Giotis wrote:
> For our use cases, it would be much better to add new child elements to IF or do other similar extensions, that having to repeat part of the costly layout process. Besides repeating, the FO -> IF is easily executed by multiple threads, while the IF->PDF can not be parallelised (without big changes).

It doesn’t shock me to store text as text in the IF and to re-do the
glyph mapping when rendering it to the final output format. This is
actually how it is done ATM.

Sure it may become more costly when you start using complex scripts, but
that would have to be confirmed with some profiling first and foremost.
We might be surprised.

We should keep in mind that it’s a perfectly reasonable use case to add
text to the IF as part of a post-processing step. That text will have to
go through the glyph mapping code anyway.

Also, to have copy-paste work properly from PDF the original text must
be present in the IF.

Storing information about the private use area in the IF is exposing
internal implementation details of FOP. When going the direct FO to PDF
route, mapping glyphs to character codes to re-map them again into
glyphs when creating the PDF is sub-optimal. We might as well work with
the glyph indices all the way through.


Vincent


> On 25 Apr 2013, at 01:52, Glenn Adams <[hidden email]> wrote:
>
>> I see no option but to modify IF. We modified IF for 1.1 in the first place.  We have recently made quite a number of backward incompatible changes to the FOP public APIs. I expect the next release will need to bump the major version to 2 for FOP due to these changes, so there is little risk in making a change in IF. If there are other, useful changes to IF that have been postponed, then perhaps they should be reconsidered now as well.
>>
>>
>> On Wed, Apr 24, 2013 at 3:26 PM, Luis Bernardo <[hidden email]> wrote:
>>
>> These are good suggestions. I am fully aware of the shortcomings that you pointed out, but the only other option seemed to be to codify the mappings in IF, similar to your first suggestion. However that would mean changing IF which is not something we are keen to do since that impacts applications that rely on the current format.
>>
>> Are you saying that with your second approach there is no need to change IF?
>>
>>
>> On 4/24/13 7:38 PM, Glenn Adams wrote:
>>> Sure. One way to do this would be to add child elements to the <font/> element in IF output as follows:
>>>
>>> <font family="Lateef" style="normal" ...>
>>>   <pua code="0xE000" gid="139"/>
>>>   <pua code="0xE001" gid="481"/>
>>>   <pua code="0xE002" gid="219"/>
>>> </font>
>>>
>>> where these PUA mappings are collected by iterating over the characters of TextAreas governed by the <font/> element. These characters might be iterated upon invoking TextArea.add{Word,Space}, and collecting this info in text areas.
>>>
>>> Alternatively, MultiByteFont.getUsedGlyphs() could be used to (1) determine which glyph codes were referenced by the document, (2) given these used codes, iterate of the the CMAP mappings to find which PUA codes were generated for those glyph codes, then (3) output the <pua/> elements (above) as required.
>>>
>>> Finally, when reading an IF file, these <pua/> elements would be used to augment the font's CMAP (keeping in mind that when reading the font, MultiByteFont.createPrivateUseMappings() may have already been called, and thus the mappings in <pua/> elements may need to be replaced or merged.
>>>
>>> I can imagine various other optimizations on the above theme to make this readily workable.
>>>
>>>
>>>
>>> On Wed, Apr 24, 2013 at 3:18 AM, Chris Bowditch <[hidden email]> wrote:
>>> Hi Glenn,
>>>
>>> Can you suggest an alternative approach please?
>>>
>>> Thanks,
>>>
>>> Chris
>>>
>>>
>>> On 24/04/2013 02:41, Glenn Adams wrote:
>>> I don't like this. It negates any additional processing that may have occurred, such as letter spacing. It requires the IF to repeat part of the layout process. Bad idea.
>>>
>>>
>>> On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <[hidden email] <mailto:[hidden email]>> wrote:
>>>
>>>
>>>     With the approach implemented by Simon what gets written to the IF
>>>     file is the original sequence, not the mapped sequence. Then when
>>>     generating PDF from IF the same code that would generate the
>>>     synthesized mappings when generating PDF straight from FO is
>>>     called to recreate the mappings. So I don't think we can say there
>>>     is information about the mappings in the text nodes.
>>>
>>>
>>>     On 4/23/13 5:50 AM, Glenn Adams wrote:
>>>     Ah, I reread your earlier (private) message. I see the problem
>>>     has to do with the use of synthesized PUA mappings. Here, the
>>>     problem really is that the font should always have a CMAP entry
>>>     that maps to every glyph that can be produced by the GSUB
>>>     process. However, not all fonts do this, so in the case in point,
>>>     we have to synthesize some mapping, from which we have to turn to
>>>     PUA assignments. This works when we generate PDF since we
>>>     generate a subset font that contains the synthesized mappings.
>>>     However, I can see that if this is going to IF instead of PDF/PS,
>>>     then we need to find a way to recreate those synthesized mappings.
>>>
>>>     I think this information is really font-specific, and should not
>>>     be tied to specific text nodes though. So if Simon's fix uses
>>>     text nodes, then that is probably not the best approach.
>>>
>>>
>>>     On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <[hidden email]
>>>     <mailto:[hidden email]>> wrote:
>>>
>>>         I'm presently at W3C WG meetings this week, but I'll try to
>>>         get on my schedule. I'm not sure what the IF->PS/PDF problem
>>>         is, since the IF->PDF path is clearly working from my tests.
>>>
>>>
>>>         On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
>>>         <[hidden email] <mailto:[hidden email]>> wrote:
>>>
>>>
>>>             Glenn,
>>>
>>>             Can you give your opinion about the approach used by
>>>             Simon? As I mentioned before (in a private message), the
>>>             IF -> PS/PDF route does not work in your original CS
>>>             patch (for the languages that CS targets) due to the
>>>             mapped sequences. Simon's approach works but requires
>>>             keeping the original sequences alongside the mapped ones.
>>>             I think it is a good approach but I would like to know if
>>>             you have a better suggestion before we apply the patch.
>>>
>>>             Thanks,
>>>             Luis
>>>
>>>
>>>             On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
>>>
>>>                 [
>>>                 https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>>                 ]
>>>
>>>                 Chris Bowditch reassigned FOP-2210:
>>>                 -----------------------------------
>>>
>>>                      Assignee: Chris Bowditch
>>>
>>>                     [PATCH] Complex script IF to output missing glyphs
>>>                     --------------------------------------------------
>>>
>>>                                      Key: FOP-2210
>>>                                      URL:
>>>                     https://issues.apache.org/jira/browse/FOP-2210
>>>                                  Project: Fop
>>>                               Issue Type: Bug
>>>                                 Reporter: simon steiner
>>>                                 Assignee: Chris Bowditch
>>>                              Attachments: csspeedtrunk.patch,
>>>                     fop.xconf, test.fo <http://test.fo>
>>>
>>>
>>>                     fop test.fo <http://test.fo> -c fop.xconf -if
>>>
>>>                     application/pdf expected.if.xml
>>>                     fop -c fop.xconf -ifin expected.if.xml out.pdf
>>>
>>>                 --
>>>                 This message is automatically generated by JIRA.
>>>                 If you think it was sent incorrectly, please contact
>>>                 your JIRA administrators
>>>                 For more information on JIRA, see:
>>>                 http://www.atlassian.com/software/jira
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Glenn Adams-2

On Thu, Apr 25, 2013 at 2:31 AM, Vincent Hennebert <[hidden email]> wrote:

It doesn’t shock me to store text as text in the IF and to re-do the
glyph mapping when rendering it to the final output format. This is
actually how it is done ATM.

I think this a bad idea for the reasons that Alexios mentioned, and that I previously mentioned about recreating sufficient layout context to repeat the process reliably.
 

Sure it may become more costly when you start using complex scripts, but
that would have to be confirmed with some profiling first and foremost.
We might be surprised.

We should keep in mind that it’s a perfectly reasonable use case to add
text to the IF as part of a post-processing step. That text will have to
go through the glyph mapping code anyway.

Also, to have copy-paste work properly from PDF the original text must
be present in the IF.

Agreed, but this is a different requirement. And doesn't entail reconstructing part of the layout context and repeating the character to glyph mapping and positioning process.
 
Storing information about the private use area in the IF is exposing
internal implementation details of FOP.

I disagree. In fact, it is working around a bug that exists in certain fonts which forces FOP to make use of synthesized PUA mappings. The bug is that the font designer did not fully populate the original CMAP, i.e., include a mapping for every accessible glyph.
 
When going the direct FO to PDF
route, mapping glyphs to character codes to re-map them again into
glyphs when creating the PDF is sub-optimal. We might as well work with
the glyph indices all the way through.

This is possible, but wouldn't it require two separate paths through the IF layer, and would it not work for non-PDF output? I suspect this falls under the category of "premature optimization", on which Knuth says "Premature optimization is the root of all evil (or at least most of it) in programming."
 


Vincent


> On 25 Apr 2013, at 01:52, Glenn Adams <[hidden email]> wrote:
>
>> I see no option but to modify IF. We modified IF for 1.1 in the first place.  We have recently made quite a number of backward incompatible changes to the FOP public APIs. I expect the next release will need to bump the major version to 2 for FOP due to these changes, so there is little risk in making a change in IF. If there are other, useful changes to IF that have been postponed, then perhaps they should be reconsidered now as well.
>>
>>
>> On Wed, Apr 24, 2013 at 3:26 PM, Luis Bernardo <[hidden email]> wrote:
>>
>> These are good suggestions. I am fully aware of the shortcomings that you pointed out, but the only other option seemed to be to codify the mappings in IF, similar to your first suggestion. However that would mean changing IF which is not something we are keen to do since that impacts applications that rely on the current format.
>>
>> Are you saying that with your second approach there is no need to change IF?
>>
>>
>> On 4/24/13 7:38 PM, Glenn Adams wrote:
>>> Sure. One way to do this would be to add child elements to the <font/> element in IF output as follows:
>>>
>>> <font family="Lateef" style="normal" ...>
>>>   <pua code="0xE000" gid="139"/>
>>>   <pua code="0xE001" gid="481"/>
>>>   <pua code="0xE002" gid="219"/>
>>> </font>
>>>
>>> where these PUA mappings are collected by iterating over the characters of TextAreas governed by the <font/> element. These characters might be iterated upon invoking TextArea.add{Word,Space}, and collecting this info in text areas.
>>>
>>> Alternatively, MultiByteFont.getUsedGlyphs() could be used to (1) determine which glyph codes were referenced by the document, (2) given these used codes, iterate of the the CMAP mappings to find which PUA codes were generated for those glyph codes, then (3) output the <pua/> elements (above) as required.
>>>
>>> Finally, when reading an IF file, these <pua/> elements would be used to augment the font's CMAP (keeping in mind that when reading the font, MultiByteFont.createPrivateUseMappings() may have already been called, and thus the mappings in <pua/> elements may need to be replaced or merged.
>>>
>>> I can imagine various other optimizations on the above theme to make this readily workable.
>>>
>>>
>>>
>>> On Wed, Apr 24, 2013 at 3:18 AM, Chris Bowditch <[hidden email]> wrote:
>>> Hi Glenn,
>>>
>>> Can you suggest an alternative approach please?
>>>
>>> Thanks,
>>>
>>> Chris
>>>
>>>
>>> On 24/04/2013 02:41, Glenn Adams wrote:
>>> I don't like this. It negates any additional processing that may have occurred, such as letter spacing. It requires the IF to repeat part of the layout process. Bad idea.
>>>
>>>
>>> On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <[hidden email] <mailto:[hidden email]>> wrote:
>>>
>>>
>>>     With the approach implemented by Simon what gets written to the IF
>>>     file is the original sequence, not the mapped sequence. Then when
>>>     generating PDF from IF the same code that would generate the
>>>     synthesized mappings when generating PDF straight from FO is
>>>     called to recreate the mappings. So I don't think we can say there
>>>     is information about the mappings in the text nodes.
>>>
>>>
>>>     On 4/23/13 5:50 AM, Glenn Adams wrote:
>>>     Ah, I reread your earlier (private) message. I see the problem
>>>     has to do with the use of synthesized PUA mappings. Here, the
>>>     problem really is that the font should always have a CMAP entry
>>>     that maps to every glyph that can be produced by the GSUB
>>>     process. However, not all fonts do this, so in the case in point,
>>>     we have to synthesize some mapping, from which we have to turn to
>>>     PUA assignments. This works when we generate PDF since we
>>>     generate a subset font that contains the synthesized mappings.
>>>     However, I can see that if this is going to IF instead of PDF/PS,
>>>     then we need to find a way to recreate those synthesized mappings.
>>>
>>>     I think this information is really font-specific, and should not
>>>     be tied to specific text nodes though. So if Simon's fix uses
>>>     text nodes, then that is probably not the best approach.
>>>
>>>
>>>     On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <[hidden email]
>>>     <mailto:[hidden email]>> wrote:
>>>
>>>         I'm presently at W3C WG meetings this week, but I'll try to
>>>         get on my schedule. I'm not sure what the IF->PS/PDF problem
>>>         is, since the IF->PDF path is clearly working from my tests.
>>>
>>>
>>>         On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
>>>         <[hidden email] <mailto:[hidden email]>> wrote:
>>>
>>>
>>>             Glenn,
>>>
>>>             Can you give your opinion about the approach used by
>>>             Simon? As I mentioned before (in a private message), the
>>>             IF -> PS/PDF route does not work in your original CS
>>>             patch (for the languages that CS targets) due to the
>>>             mapped sequences. Simon's approach works but requires
>>>             keeping the original sequences alongside the mapped ones.
>>>             I think it is a good approach but I would like to know if
>>>             you have a better suggestion before we apply the patch.
>>>
>>>             Thanks,
>>>             Luis
>>>
>>>
>>>             On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
>>>
>>>                 [
>>>                 https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>>                 ]
>>>
>>>                 Chris Bowditch reassigned FOP-2210:
>>>                 -----------------------------------
>>>
>>>                      Assignee: Chris Bowditch
>>>
>>>                     [PATCH] Complex script IF to output missing glyphs
>>>                     --------------------------------------------------
>>>
>>>                                      Key: FOP-2210
>>>                                      URL:
>>>                     https://issues.apache.org/jira/browse/FOP-2210
>>>                                  Project: Fop
>>>                               Issue Type: Bug
>>>                                 Reporter: simon steiner
>>>                                 Assignee: Chris Bowditch
>>>                              Attachments: csspeedtrunk.patch,
>>>                     fop.xconf, test.fo <http://test.fo>
>>>
>>>
>>>                     fop test.fo <http://test.fo> -c fop.xconf -if
>>>
>>>                     application/pdf expected.if.xml
>>>                     fop -c fop.xconf -ifin expected.if.xml out.pdf
>>>
>>>                 --
>>>                 This message is automatically generated by JIRA.
>>>                 If you think it was sent incorrectly, please contact
>>>                 your JIRA administrators
>>>                 For more information on JIRA, see:
>>>                 http://www.atlassian.com/software/jira
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Vincent Hennebert-2
On 25/04/13 17:48, Glenn Adams wrote:

> On Thu, Apr 25, 2013 at 2:31 AM, Vincent Hennebert <[hidden email]>wrote:
>
>>
>> It doesn’t shock me to store text as text in the IF and to re-do the
>> glyph mapping when rendering it to the final output format. This is
>> actually how it is done ATM.
>>
>
> I think this a bad idea for the reasons that Alexios mentioned, and that I
> previously mentioned about recreating sufficient layout context to repeat
> the process reliably.

What exactly do you mean by ‘sufficient layout context’? What would be
missing from the IF that would prevent to re-do the glyph mapping?


>> Sure it may become more costly when you start using complex scripts,
>> but
>> that would have to be confirmed with some profiling first and foremost.
>> We might be surprised.
>>
>> We should keep in mind that it’s a perfectly reasonable use case to add
>> text to the IF as part of a post-processing step. That text will have to
>> go through the glyph mapping code anyway.
>>
>> Also, to have copy-paste work properly from PDF the original text must
>> be present in the IF.
>>
>
> Agreed, but this is a different requirement. And doesn't entail
> reconstructing part of the layout context and repeating the character to
> glyph mapping and positioning process.

You’ll have to do that for text added at post-process time anyway?


>> Storing information about the private use area in the IF is exposing
>> internal implementation details of FOP.
>
>
> I disagree. In fact, it is working around a bug that exists in certain
> fonts which forces FOP to make use of synthesized PUA mappings. The bug is
> that the font designer did not fully populate the original CMAP, i.e.,
> include a mapping for every accessible glyph.

I still don’t get it I’m afraid. Where in the TrueType spec is it stated
that every glyph should have an entry in the cmap? Why can’t FOP just
use the glyph ID? Surely that information is enough?


>> When going the direct FO to PDF
>> route, mapping glyphs to character codes to re-map them again into
>> glyphs when creating the PDF is sub-optimal. We might as well work with
>> the glyph indices all the way through.
>>
>
> This is possible, but wouldn't it require two separate paths through the IF
> layer, and would it not work for non-PDF output?

I don’t think so. The original text should be passed through anyway to
create the ToUnicode cmap. So PDF can use the glyph mapping to generate
the text operators and the original text for the ToUnicode cmap. The IF
renderer just streams out the original text. And the other renderers
just deal with the glyph mapping.


Vincent


> I suspect this falls under
> the category of "premature optimization", on which Knuth says "Premature
> optimization is the root of all evil (or at least most of it) in
> programming."
>
>
>>
>>
>> Vincent
>>
>>
>>> On 25 Apr 2013, at 01:52, Glenn Adams <[hidden email]> wrote:
>>>
>>>> I see no option but to modify IF. We modified IF for 1.1 in the first
>> place.  We have recently made quite a number of backward incompatible
>> changes to the FOP public APIs. I expect the next release will need to bump
>> the major version to 2 for FOP due to these changes, so there is little
>> risk in making a change in IF. If there are other, useful changes to IF
>> that have been postponed, then perhaps they should be reconsidered now as
>> well.
>>>>
>>>>
>>>> On Wed, Apr 24, 2013 at 3:26 PM, Luis Bernardo <[hidden email]>
>> wrote:
>>>>
>>>> These are good suggestions. I am fully aware of the shortcomings that
>> you pointed out, but the only other option seemed to be to codify the
>> mappings in IF, similar to your first suggestion. However that would mean
>> changing IF which is not something we are keen to do since that impacts
>> applications that rely on the current format.
>>>>
>>>> Are you saying that with your second approach there is no need to
>> change IF?
>>>>
>>>>
>>>> On 4/24/13 7:38 PM, Glenn Adams wrote:
>>>>> Sure. One way to do this would be to add child elements to the <font/>
>> element in IF output as follows:
>>>>>
>>>>> <font family="Lateef" style="normal" ...>
>>>>>   <pua code="0xE000" gid="139"/>
>>>>>   <pua code="0xE001" gid="481"/>
>>>>>   <pua code="0xE002" gid="219"/>
>>>>> </font>
>>>>>
>>>>> where these PUA mappings are collected by iterating over the
>> characters of TextAreas governed by the <font/> element. These characters
>> might be iterated upon invoking TextArea.add{Word,Space}, and collecting
>> this info in text areas.
>>>>>
>>>>> Alternatively, MultiByteFont.getUsedGlyphs() could be used to (1)
>> determine which glyph codes were referenced by the document, (2) given
>> these used codes, iterate of the the CMAP mappings to find which PUA codes
>> were generated for those glyph codes, then (3) output the <pua/> elements
>> (above) as required.
>>>>>
>>>>> Finally, when reading an IF file, these <pua/> elements would be used
>> to augment the font's CMAP (keeping in mind that when reading the font,
>> MultiByteFont.createPrivateUseMappings() may have already been called, and
>> thus the mappings in <pua/> elements may need to be replaced or merged.
>>>>>
>>>>> I can imagine various other optimizations on the above theme to make
>> this readily workable.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 24, 2013 at 3:18 AM, Chris Bowditch <
>> [hidden email]> wrote:
>>>>> Hi Glenn,
>>>>>
>>>>> Can you suggest an alternative approach please?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>> On 24/04/2013 02:41, Glenn Adams wrote:
>>>>> I don't like this. It negates any additional processing that may have
>> occurred, such as letter spacing. It requires the IF to repeat part of the
>> layout process. Bad idea.
>>>>>
>>>>>
>>>>> On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <[hidden email]<mailto:
>> [hidden email]>> wrote:
>>>>>
>>>>>
>>>>>     With the approach implemented by Simon what gets written to the IF
>>>>>     file is the original sequence, not the mapped sequence. Then when
>>>>>     generating PDF from IF the same code that would generate the
>>>>>     synthesized mappings when generating PDF straight from FO is
>>>>>     called to recreate the mappings. So I don't think we can say there
>>>>>     is information about the mappings in the text nodes.
>>>>>
>>>>>
>>>>>     On 4/23/13 5:50 AM, Glenn Adams wrote:
>>>>>     Ah, I reread your earlier (private) message. I see the problem
>>>>>     has to do with the use of synthesized PUA mappings. Here, the
>>>>>     problem really is that the font should always have a CMAP entry
>>>>>     that maps to every glyph that can be produced by the GSUB
>>>>>     process. However, not all fonts do this, so in the case in point,
>>>>>     we have to synthesize some mapping, from which we have to turn to
>>>>>     PUA assignments. This works when we generate PDF since we
>>>>>     generate a subset font that contains the synthesized mappings.
>>>>>     However, I can see that if this is going to IF instead of PDF/PS,
>>>>>     then we need to find a way to recreate those synthesized mappings.
>>>>>
>>>>>     I think this information is really font-specific, and should not
>>>>>     be tied to specific text nodes though. So if Simon's fix uses
>>>>>     text nodes, then that is probably not the best approach.
>>>>>
>>>>>
>>>>>     On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <[hidden email]
>>>>>     <mailto:[hidden email]>> wrote:
>>>>>
>>>>>         I'm presently at W3C WG meetings this week, but I'll try to
>>>>>         get on my schedule. I'm not sure what the IF->PS/PDF problem
>>>>>         is, since the IF->PDF path is clearly working from my tests.
>>>>>
>>>>>
>>>>>         On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
>>>>>         <[hidden email] <mailto:[hidden email]>>
>> wrote:
>>>>>
>>>>>
>>>>>             Glenn,
>>>>>
>>>>>             Can you give your opinion about the approach used by
>>>>>             Simon? As I mentioned before (in a private message), the
>>>>>             IF -> PS/PDF route does not work in your original CS
>>>>>             patch (for the languages that CS targets) due to the
>>>>>             mapped sequences. Simon's approach works but requires
>>>>>             keeping the original sequences alongside the mapped ones.
>>>>>             I think it is a good approach but I would like to know if
>>>>>             you have a better suggestion before we apply the patch.
>>>>>
>>>>>             Thanks,
>>>>>             Luis
>>>>>
>>>>>
>>>>>             On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
>>>>>
>>>>>                 [
>>>>>
>> https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>>>>                 ]
>>>>>
>>>>>                 Chris Bowditch reassigned FOP-2210:
>>>>>                 -----------------------------------
>>>>>
>>>>>                      Assignee: Chris Bowditch
>>>>>
>>>>>                     [PATCH] Complex script IF to output missing glyphs
>>>>>                     --------------------------------------------------
>>>>>
>>>>>                                      Key: FOP-2210
>>>>>                                      URL:
>>>>>                     https://issues.apache.org/jira/browse/FOP-2210
>>>>>                                  Project: Fop
>>>>>                               Issue Type: Bug
>>>>>                                 Reporter: simon steiner
>>>>>                                 Assignee: Chris Bowditch
>>>>>                              Attachments: csspeedtrunk.patch,
>>>>>                     fop.xconf, test.fo <http://test.fo>
>>>>>
>>>>>
>>>>>                     fop test.fo <http://test.fo> -c fop.xconf -if
>>>>>
>>>>>                     application/pdf expected.if.xml
>>>>>                     fop -c fop.xconf -ifin expected.if.xml out.pdf
>>>>>
>>>>>                 --
>>>>>                 This message is automatically generated by JIRA.
>>>>>                 If you think it was sent incorrectly, please contact
>>>>>                 your JIRA administrators
>>>>>                 For more information on JIRA, see:
>>>>>                 http://www.atlassian.com/software/jira
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Glenn Adams-2

On Thu, Apr 25, 2013 at 1:08 PM, Vincent Hennebert <[hidden email]> wrote:
On 25/04/13 17:48, Glenn Adams wrote:
> On Thu, Apr 25, 2013 at 2:31 AM, Vincent Hennebert <[hidden email]>wrote:
>
>>
>> It doesn’t shock me to store text as text in the IF and to re-do the
>> glyph mapping when rendering it to the final output format. This is
>> actually how it is done ATM.
>>
>
> I think this a bad idea for the reasons that Alexios mentioned, and that I
> previously mentioned about recreating sufficient layout context to repeat
> the process reliably.

What exactly do you mean by ‘sufficient layout context’? What would be
missing from the IF that would prevent to re-do the glyph mapping?

Off hand, we would need:
  • language
  • script
  • font features to be applied (with parameters)
  • letter-spacing settings
There are probably others. I just don't see any reason to use this approach.
 


>> Sure it may become more costly when you start using complex scripts,
>> but
>> that would have to be confirmed with some profiling first and foremost.
>> We might be surprised.
>>
>> We should keep in mind that it’s a perfectly reasonable use case to add
>> text to the IF as part of a post-processing step. That text will have to
>> go through the glyph mapping code anyway.
>>
>> Also, to have copy-paste work properly from PDF the original text must
>> be present in the IF.
>>
>
> Agreed, but this is a different requirement. And doesn't entail
> reconstructing part of the layout context and repeating the character to
> glyph mapping and positioning process.

You’ll have to do that for text added at post-process time anyway?

I don't understand what this means.
 


>> Storing information about the private use area in the IF is exposing
>> internal implementation details of FOP.
>
>
> I disagree. In fact, it is working around a bug that exists in certain
> fonts which forces FOP to make use of synthesized PUA mappings. The bug is
> that the font designer did not fully populate the original CMAP, i.e.,
> include a mapping for every accessible glyph.

I still don’t get it I’m afraid. Where in the TrueType spec is it stated
that every glyph should have an entry in the cmap?

It doesn't. But if someone uses a font, wants to present a glyph that has no mapping, and must use character codes, then it won't work.
 
Why can’t FOP just
use the glyph ID? Surely that information is enough?

Well, for one thing, the IF interface for renderText uses a character string, not a glyph index string, and the IF XML format uses Unicode code points.
 


>> When going the direct FO to PDF
>> route, mapping glyphs to character codes to re-map them again into
>> glyphs when creating the PDF is sub-optimal. We might as well work with
>> the glyph indices all the way through.
>>
>
> This is possible, but wouldn't it require two separate paths through the IF
> layer, and would it not work for non-PDF output?

I don’t think so. The original text should be passed through anyway to
create the ToUnicode cmap.

Why? 
 
So PDF can use the glyph mapping to generate
the text operators and the original text for the ToUnicode cmap. The IF
renderer just streams out the original text. And the other renderers
just deal with the glyph mapping.

Since the technique I suggests will work and does not require this, then this (repeating the character to glyph mapping, positioning, and layout process) isn't necessary. I have agreed, however, that embedding the original UC text for performing copy and find operations will be useful, for which there is already an open bug [1].

 


Vincent


> I suspect this falls under
> the category of "premature optimization", on which Knuth says "Premature
> optimization is the root of all evil (or at least most of it) in
> programming."
>
>
>>
>>
>> Vincent
>>
>>
>>> On 25 Apr 2013, at 01:52, Glenn Adams <[hidden email]> wrote:
>>>
>>>> I see no option but to modify IF. We modified IF for 1.1 in the first
>> place.  We have recently made quite a number of backward incompatible
>> changes to the FOP public APIs. I expect the next release will need to bump
>> the major version to 2 for FOP due to these changes, so there is little
>> risk in making a change in IF. If there are other, useful changes to IF
>> that have been postponed, then perhaps they should be reconsidered now as
>> well.
>>>>
>>>>
>>>> On Wed, Apr 24, 2013 at 3:26 PM, Luis Bernardo <[hidden email]>
>> wrote:
>>>>
>>>> These are good suggestions. I am fully aware of the shortcomings that
>> you pointed out, but the only other option seemed to be to codify the
>> mappings in IF, similar to your first suggestion. However that would mean
>> changing IF which is not something we are keen to do since that impacts
>> applications that rely on the current format.
>>>>
>>>> Are you saying that with your second approach there is no need to
>> change IF?
>>>>
>>>>
>>>> On 4/24/13 7:38 PM, Glenn Adams wrote:
>>>>> Sure. One way to do this would be to add child elements to the <font/>
>> element in IF output as follows:
>>>>>
>>>>> <font family="Lateef" style="normal" ...>
>>>>>   <pua code="0xE000" gid="139"/>
>>>>>   <pua code="0xE001" gid="481"/>
>>>>>   <pua code="0xE002" gid="219"/>
>>>>> </font>
>>>>>
>>>>> where these PUA mappings are collected by iterating over the
>> characters of TextAreas governed by the <font/> element. These characters
>> might be iterated upon invoking TextArea.add{Word,Space}, and collecting
>> this info in text areas.
>>>>>
>>>>> Alternatively, MultiByteFont.getUsedGlyphs() could be used to (1)
>> determine which glyph codes were referenced by the document, (2) given
>> these used codes, iterate of the the CMAP mappings to find which PUA codes
>> were generated for those glyph codes, then (3) output the <pua/> elements
>> (above) as required.
>>>>>
>>>>> Finally, when reading an IF file, these <pua/> elements would be used
>> to augment the font's CMAP (keeping in mind that when reading the font,
>> MultiByteFont.createPrivateUseMappings() may have already been called, and
>> thus the mappings in <pua/> elements may need to be replaced or merged.
>>>>>
>>>>> I can imagine various other optimizations on the above theme to make
>> this readily workable.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 24, 2013 at 3:18 AM, Chris Bowditch <
>> [hidden email]> wrote:
>>>>> Hi Glenn,
>>>>>
>>>>> Can you suggest an alternative approach please?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Chris
>>>>>
>>>>>
>>>>> On 24/04/2013 02:41, Glenn Adams wrote:
>>>>> I don't like this. It negates any additional processing that may have
>> occurred, such as letter spacing. It requires the IF to repeat part of the
>> layout process. Bad idea.
>>>>>
>>>>>
>>>>> On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <[hidden email]<mailto:
>> [hidden email]>> wrote:
>>>>>
>>>>>
>>>>>     With the approach implemented by Simon what gets written to the IF
>>>>>     file is the original sequence, not the mapped sequence. Then when
>>>>>     generating PDF from IF the same code that would generate the
>>>>>     synthesized mappings when generating PDF straight from FO is
>>>>>     called to recreate the mappings. So I don't think we can say there
>>>>>     is information about the mappings in the text nodes.
>>>>>
>>>>>
>>>>>     On 4/23/13 5:50 AM, Glenn Adams wrote:
>>>>>     Ah, I reread your earlier (private) message. I see the problem
>>>>>     has to do with the use of synthesized PUA mappings. Here, the
>>>>>     problem really is that the font should always have a CMAP entry
>>>>>     that maps to every glyph that can be produced by the GSUB
>>>>>     process. However, not all fonts do this, so in the case in point,
>>>>>     we have to synthesize some mapping, from which we have to turn to
>>>>>     PUA assignments. This works when we generate PDF since we
>>>>>     generate a subset font that contains the synthesized mappings.
>>>>>     However, I can see that if this is going to IF instead of PDF/PS,
>>>>>     then we need to find a way to recreate those synthesized mappings.
>>>>>
>>>>>     I think this information is really font-specific, and should not
>>>>>     be tied to specific text nodes though. So if Simon's fix uses
>>>>>     text nodes, then that is probably not the best approach.
>>>>>
>>>>>
>>>>>     On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <[hidden email]
>>>>>     <mailto:[hidden email]>> wrote:
>>>>>
>>>>>         I'm presently at W3C WG meetings this week, but I'll try to
>>>>>         get on my schedule. I'm not sure what the IF->PS/PDF problem
>>>>>         is, since the IF->PDF path is clearly working from my tests.
>>>>>
>>>>>
>>>>>         On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
>>>>>         <[hidden email] <mailto:[hidden email]>>
>> wrote:
>>>>>
>>>>>
>>>>>             Glenn,
>>>>>
>>>>>             Can you give your opinion about the approach used by
>>>>>             Simon? As I mentioned before (in a private message), the
>>>>>             IF -> PS/PDF route does not work in your original CS
>>>>>             patch (for the languages that CS targets) due to the
>>>>>             mapped sequences. Simon's approach works but requires
>>>>>             keeping the original sequences alongside the mapped ones.
>>>>>             I think it is a good approach but I would like to know if
>>>>>             you have a better suggestion before we apply the patch.
>>>>>
>>>>>             Thanks,
>>>>>             Luis
>>>>>
>>>>>
>>>>>             On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
>>>>>
>>>>>                 [
>>>>>
>> https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>>>>                 ]
>>>>>
>>>>>                 Chris Bowditch reassigned FOP-2210:
>>>>>                 -----------------------------------
>>>>>
>>>>>                      Assignee: Chris Bowditch
>>>>>
>>>>>                     [PATCH] Complex script IF to output missing glyphs
>>>>>                     --------------------------------------------------
>>>>>
>>>>>                                      Key: FOP-2210
>>>>>                                      URL:
>>>>>                     https://issues.apache.org/jira/browse/FOP-2210
>>>>>                                  Project: Fop
>>>>>                               Issue Type: Bug
>>>>>                                 Reporter: simon steiner
>>>>>                                 Assignee: Chris Bowditch
>>>>>                              Attachments: csspeedtrunk.patch,
>>>>>                     fop.xconf, test.fo <http://test.fo>
>>>>>
>>>>>
>>>>>                     fop test.fo <http://test.fo> -c fop.xconf -if
>>>>>
>>>>>                     application/pdf expected.if.xml
>>>>>                     fop -c fop.xconf -ifin expected.if.xml out.pdf
>>>>>
>>>>>                 --
>>>>>                 This message is automatically generated by JIRA.
>>>>>                 If you think it was sent incorrectly, please contact
>>>>>                 your JIRA administrators
>>>>>                 For more information on JIRA, see:
>>>>>                 http://www.atlassian.com/software/jira
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Vincent Hennebert-2
On 25/04/13 22:33, Glenn Adams wrote:

> On Thu, Apr 25, 2013 at 1:08 PM, Vincent Hennebert <[hidden email]>wrote:
>
>> On 25/04/13 17:48, Glenn Adams wrote:
>>> On Thu, Apr 25, 2013 at 2:31 AM, Vincent Hennebert <[hidden email]
>>> wrote:
>>>
>>>>
>>>> It doesn’t shock me to store text as text in the IF and to re-do the
>>>> glyph mapping when rendering it to the final output format. This is
>>>> actually how it is done ATM.
>>>>
>>>
>>> I think this a bad idea for the reasons that Alexios mentioned, and that
>> I
>>> previously mentioned about recreating sufficient layout context to repeat
>>> the process reliably.
>>
>> What exactly do you mean by ‘sufficient layout context’? What would be
>> missing from the IF that would prevent to re-do the glyph mapping?
>>
>
> Off hand, we would need:
>
>    - language
>    - script
>    - font features to be applied (with parameters)
>    - letter-spacing settings

Apart from the font features, they are already available in the file.
Regarding font features, they could be added to the font element, but
AFAIK this is not customizable in the FO file is it? So I guess the
default set of features is applied. So that default set can also be
applied to text coming from the IF.


> There are probably others. I just don't see any reason to use this approach.
>
>
>>
>>
>>>> Sure it may become more costly when you start using complex scripts,
>>>> but
>>>> that would have to be confirmed with some profiling first and foremost.
>>>> We might be surprised.
>>>>
>>>> We should keep in mind that it’s a perfectly reasonable use case to add
>>>> text to the IF as part of a post-processing step. That text will have to
>>>> go through the glyph mapping code anyway.
>>>>
>>>> Also, to have copy-paste work properly from PDF the original text must
>>>> be present in the IF.
>>>>
>>>
>>> Agreed, but this is a different requirement. And doesn't entail
>>> reconstructing part of the layout context and repeating the character to
>>> glyph mapping and positioning process.
>>
>> You’ll have to do that for text added at post-process time anyway?
>>
>
> I don't understand what this means.

The IF can be manipulated in many ways by the user and, among other
things, text can be added to it, which will have to be rendered into the
final output.

This is an important reason why I think glyph mapping should be redone.


>>>> Storing information about the private use area in the IF is
>>>> exposing
>>>> internal implementation details of FOP.
>>>
>>>
>>> I disagree. In fact, it is working around a bug that exists in certain
>>> fonts which forces FOP to make use of synthesized PUA mappings. The bug
>> is
>>> that the font designer did not fully populate the original CMAP, i.e.,
>>> include a mapping for every accessible glyph.
>>
>> I still don’t get it I’m afraid. Where in the TrueType spec is it stated
>> that every glyph should have an entry in the cmap?
>
>
> It doesn't. But if someone uses a font, wants to present a glyph that has
> no mapping, and must use character codes, then it won't work.

That’s this ‘must use character codes’ requirement that seems buggy to
me.


>> Why can’t FOP just
>> use the glyph ID? Surely that information is enough?
>>
>
> Well, for one thing, the IF interface for renderText uses a character
> string, not a glyph index string,

No, it uses Unicode code points. It must probably be extended to pass
information about the glyph mapping as well.


> and the IF XML format uses Unicode code
> points.
>
>
>>
>>
>>>> When going the direct FO to PDF
>>>> route, mapping glyphs to character codes to re-map them again into
>>>> glyphs when creating the PDF is sub-optimal. We might as well work with
>>>> the glyph indices all the way through.
>>>>
>>>
>>> This is possible, but wouldn't it require two separate paths through the
>> IF
>>> layer, and would it not work for non-PDF output?
>>
>> I don’t think so. The original text should be passed through anyway to
>> create the ToUnicode cmap.
>
>
> Why?

For copy/pasting to work in PDF. The original text must be returned.
This is also important for accessibility (reading the text aloud).


>> So PDF can use the glyph mapping to generate
>> the text operators and the original text for the ToUnicode cmap. The IF
>> renderer just streams out the original text. And the other renderers
>> just deal with the glyph mapping.
>>
>
> Since the technique I suggests will work and does not require this, then
> this (repeating the character to glyph mapping, positioning, and layout
> process) isn't necessary. I have agreed, however, that embedding the
> original UC text for performing copy and find operations will be useful,
> for which there is already an open bug [1].
>
> [1] https://issues.apache.org/jira/browse/FOP-2204
>
>
>>
>>
>> Vincent
>>
>>
>>> I suspect this falls under
>>> the category of "premature optimization", on which Knuth says "Premature
>>> optimization is the root of all evil (or at least most of it) in
>>> programming."
>>>
>>>
>>>>
>>>>
>>>> Vincent
>>>>
>>>>
>>>>> On 25 Apr 2013, at 01:52, Glenn Adams <[hidden email]> wrote:
>>>>>
>>>>>> I see no option but to modify IF. We modified IF for 1.1 in the first
>>>> place.  We have recently made quite a number of backward incompatible
>>>> changes to the FOP public APIs. I expect the next release will need to
>> bump
>>>> the major version to 2 for FOP due to these changes, so there is little
>>>> risk in making a change in IF. If there are other, useful changes to IF
>>>> that have been postponed, then perhaps they should be reconsidered now
>> as
>>>> well.
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 24, 2013 at 3:26 PM, Luis Bernardo <
>> [hidden email]>
>>>> wrote:
>>>>>>
>>>>>> These are good suggestions. I am fully aware of the shortcomings that
>>>> you pointed out, but the only other option seemed to be to codify the
>>>> mappings in IF, similar to your first suggestion. However that would
>> mean
>>>> changing IF which is not something we are keen to do since that impacts
>>>> applications that rely on the current format.
>>>>>>
>>>>>> Are you saying that with your second approach there is no need to
>>>> change IF?
>>>>>>
>>>>>>
>>>>>> On 4/24/13 7:38 PM, Glenn Adams wrote:
>>>>>>> Sure. One way to do this would be to add child elements to the
>> <font/>
>>>> element in IF output as follows:
>>>>>>>
>>>>>>> <font family="Lateef" style="normal" ...>
>>>>>>>   <pua code="0xE000" gid="139"/>
>>>>>>>   <pua code="0xE001" gid="481"/>
>>>>>>>   <pua code="0xE002" gid="219"/>
>>>>>>> </font>
>>>>>>>
>>>>>>> where these PUA mappings are collected by iterating over the
>>>> characters of TextAreas governed by the <font/> element. These
>> characters
>>>> might be iterated upon invoking TextArea.add{Word,Space}, and collecting
>>>> this info in text areas.
>>>>>>>
>>>>>>> Alternatively, MultiByteFont.getUsedGlyphs() could be used to (1)
>>>> determine which glyph codes were referenced by the document, (2) given
>>>> these used codes, iterate of the the CMAP mappings to find which PUA
>> codes
>>>> were generated for those glyph codes, then (3) output the <pua/>
>> elements
>>>> (above) as required.
>>>>>>>
>>>>>>> Finally, when reading an IF file, these <pua/> elements would be used
>>>> to augment the font's CMAP (keeping in mind that when reading the font,
>>>> MultiByteFont.createPrivateUseMappings() may have already been called,
>> and
>>>> thus the mappings in <pua/> elements may need to be replaced or merged.
>>>>>>>
>>>>>>> I can imagine various other optimizations on the above theme to make
>>>> this readily workable.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Apr 24, 2013 at 3:18 AM, Chris Bowditch <
>>>> [hidden email]> wrote:
>>>>>>> Hi Glenn,
>>>>>>>
>>>>>>> Can you suggest an alternative approach please?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>> On 24/04/2013 02:41, Glenn Adams wrote:
>>>>>>> I don't like this. It negates any additional processing that may have
>>>> occurred, such as letter spacing. It requires the IF to repeat part of
>> the
>>>> layout process. Bad idea.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Apr 23, 2013 at 3:11 PM, Luis Bernardo <
>> [hidden email]<mailto:
>>>> [hidden email]>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>     With the approach implemented by Simon what gets written to the
>> IF
>>>>>>>     file is the original sequence, not the mapped sequence. Then when
>>>>>>>     generating PDF from IF the same code that would generate the
>>>>>>>     synthesized mappings when generating PDF straight from FO is
>>>>>>>     called to recreate the mappings. So I don't think we can say
>> there
>>>>>>>     is information about the mappings in the text nodes.
>>>>>>>
>>>>>>>
>>>>>>>     On 4/23/13 5:50 AM, Glenn Adams wrote:
>>>>>>>     Ah, I reread your earlier (private) message. I see the problem
>>>>>>>     has to do with the use of synthesized PUA mappings. Here, the
>>>>>>>     problem really is that the font should always have a CMAP entry
>>>>>>>     that maps to every glyph that can be produced by the GSUB
>>>>>>>     process. However, not all fonts do this, so in the case in point,
>>>>>>>     we have to synthesize some mapping, from which we have to turn to
>>>>>>>     PUA assignments. This works when we generate PDF since we
>>>>>>>     generate a subset font that contains the synthesized mappings.
>>>>>>>     However, I can see that if this is going to IF instead of PDF/PS,
>>>>>>>     then we need to find a way to recreate those synthesized
>> mappings.
>>>>>>>
>>>>>>>     I think this information is really font-specific, and should not
>>>>>>>     be tied to specific text nodes though. So if Simon's fix uses
>>>>>>>     text nodes, then that is probably not the best approach.
>>>>>>>
>>>>>>>
>>>>>>>     On Mon, Apr 22, 2013 at 10:45 PM, Glenn Adams <[hidden email]
>>>>>>>     <mailto:[hidden email]>> wrote:
>>>>>>>
>>>>>>>         I'm presently at W3C WG meetings this week, but I'll try to
>>>>>>>         get on my schedule. I'm not sure what the IF->PS/PDF problem
>>>>>>>         is, since the IF->PDF path is clearly working from my tests.
>>>>>>>
>>>>>>>
>>>>>>>         On Mon, Apr 22, 2013 at 4:27 PM, Luis Bernardo
>>>>>>>         <[hidden email] <mailto:[hidden email]>>
>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>             Glenn,
>>>>>>>
>>>>>>>             Can you give your opinion about the approach used by
>>>>>>>             Simon? As I mentioned before (in a private message), the
>>>>>>>             IF -> PS/PDF route does not work in your original CS
>>>>>>>             patch (for the languages that CS targets) due to the
>>>>>>>             mapped sequences. Simon's approach works but requires
>>>>>>>             keeping the original sequences alongside the mapped ones.
>>>>>>>             I think it is a good approach but I would like to know if
>>>>>>>             you have a better suggestion before we apply the patch.
>>>>>>>
>>>>>>>             Thanks,
>>>>>>>             Luis
>>>>>>>
>>>>>>>
>>>>>>>             On 4/22/13 3:23 PM, Chris Bowditch (JIRA) wrote:
>>>>>>>
>>>>>>>                 [
>>>>>>>
>>>>
>> https://issues.apache.org/jira/browse/FOP-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>>>>>>                 ]
>>>>>>>
>>>>>>>                 Chris Bowditch reassigned FOP-2210:
>>>>>>>                 -----------------------------------
>>>>>>>
>>>>>>>                      Assignee: Chris Bowditch
>>>>>>>
>>>>>>>                     [PATCH] Complex script IF to output missing
>> glyphs
>>>>>>>
>> --------------------------------------------------
>>>>>>>
>>>>>>>                                      Key: FOP-2210
>>>>>>>                                      URL:
>>>>>>>                     https://issues.apache.org/jira/browse/FOP-2210
>>>>>>>                                  Project: Fop
>>>>>>>                               Issue Type: Bug
>>>>>>>                                 Reporter: simon steiner
>>>>>>>                                 Assignee: Chris Bowditch
>>>>>>>                              Attachments: csspeedtrunk.patch,
>>>>>>>                     fop.xconf, test.fo <http://test.fo>
>>>>>>>
>>>>>>>
>>>>>>>                     fop test.fo <http://test.fo> -c fop.xconf -if
>>>>>>>
>>>>>>>                     application/pdf expected.if.xml
>>>>>>>                     fop -c fop.xconf -ifin expected.if.xml out.pdf
>>>>>>>
>>>>>>>                 --
>>>>>>>                 This message is automatically generated by JIRA.
>>>>>>>                 If you think it was sent incorrectly, please contact
>>>>>>>                 your JIRA administrators
>>>>>>>                 For more information on JIRA, see:
>>>>>>>                 http://www.atlassian.com/software/jira
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Glenn Adams-2

On Fri, Apr 26, 2013 at 12:59 PM, Vincent Hennebert <[hidden email]> wrote:
On 25/04/13 22:33, Glenn Adams wrote:
> On Thu, Apr 25, 2013 at 1:08 PM, Vincent Hennebert <[hidden email]>wrote:
>
>> On 25/04/13 17:48, Glenn Adams wrote:
>>> On Thu, Apr 25, 2013 at 2:31 AM, Vincent Hennebert <[hidden email]
>>> wrote:
>>>
>>>>
>>>> It doesn’t shock me to store text as text in the IF and to re-do the
>>>> glyph mapping when rendering it to the final output format. This is
>>>> actually how it is done ATM.
>>>>
>>>
>>> I think this a bad idea for the reasons that Alexios mentioned, and that
>> I
>>> previously mentioned about recreating sufficient layout context to repeat
>>> the process reliably.
>>
>> What exactly do you mean by ‘sufficient layout context’? What would be
>> missing from the IF that would prevent to re-do the glyph mapping?
>>
>
> Off hand, we would need:
>
>    - language
>    - script
>    - font features to be applied (with parameters)
>    - letter-spacing settings

Apart from the font features, they are already available in the file.
Regarding font features, they could be added to the font element, but
AFAIK this is not customizable in the FO file is it? So I guess the
default set of features is applied. So that default set can also be
applied to text coming from the IF.

I don't believe language and script are specifiable at a per-text level in IF.

      <xs:element name="text">
        <xs:complexType>
          <xs:simpleContent>
            <xs:extension base="xs:string">
              <xs:attribute name="x" use="required" type="mf:lengthType"/>
              <xs:attribute name="y" use="required" type="mf:lengthType"/>
              <xs:attribute name="letter-spacing" type="mf:lengthType"/>
              <xs:attribute name="word-spacing" type="mf:lengthType"/>
              <xs:attribute name="dx" type="mf:lengthListType"/>
              <xs:attribute name="dp" type="mf:dpListType"/>
              <xs:attribute name="hyphenated" type="xs:boolean"/>
            </xs:extension>
          </xs:simpleContent>
        </xs:complexType>
      </xs:element>

Yes, at present, the default features apply. However, I added code already to allow an extension property to specify features, such as defined by [1], which is on my short list of planned upgrades.


I agree in principle that these could be added to the IF data as well, so we would need to add at least the following attributes:

language
script
font-feature-settings
 


> There are probably others. I just don't see any reason to use this approach.
>
>
>>
>>
>>>> Sure it may become more costly when you start using complex scripts,
>>>> but
>>>> that would have to be confirmed with some profiling first and foremost.
>>>> We might be surprised.
>>>>
>>>> We should keep in mind that it’s a perfectly reasonable use case to add
>>>> text to the IF as part of a post-processing step. That text will have to
>>>> go through the glyph mapping code anyway.
>>>>
>>>> Also, to have copy-paste work properly from PDF the original text must
>>>> be present in the IF.
>>>>
>>>
>>> Agreed, but this is a different requirement. And doesn't entail
>>> reconstructing part of the layout context and repeating the character to
>>> glyph mapping and positioning process.
>>
>> You’ll have to do that for text added at post-process time anyway?
>>
>
> I don't understand what this means.

The IF can be manipulated in many ways by the user and, among other
things, text can be added to it, which will have to be rendered into the
final output.

This is an important reason why I think glyph mapping should be redone.

Hmm, without re-layout? Seems risky, but I agree its possible.
 
>>>> Storing information about the private use area in the IF is
>>>> exposing
>>>> internal implementation details of FOP.
>>>
>>>
>>> I disagree. In fact, it is working around a bug that exists in certain
>>> fonts which forces FOP to make use of synthesized PUA mappings. The bug
>> is
>>> that the font designer did not fully populate the original CMAP, i.e.,
>>> include a mapping for every accessible glyph.
>>
>> I still don’t get it I’m afraid. Where in the TrueType spec is it stated
>> that every glyph should have an entry in the cmap?
>
>
> It doesn't. But if someone uses a font, wants to present a glyph that has
> no mapping, and must use character codes, then it won't work.

That’s this ‘must use character codes’ requirement that seems buggy to
me.


>> Why can’t FOP just
>> use the glyph ID? Surely that information is enough?
>>
>
> Well, for one thing, the IF interface for renderText uses a character
> string, not a glyph index string,

No, it uses Unicode code points. It must probably be extended to pass
information about the glyph mapping as well.

character string = (Unicode code point)*

These are not glyph codes.
 


> and the IF XML format uses Unicode code
> points.
>
>
>>
>>
>>>> When going the direct FO to PDF
>>>> route, mapping glyphs to character codes to re-map them again into
>>>> glyphs when creating the PDF is sub-optimal. We might as well work with
>>>> the glyph indices all the way through.
>>>>
>>>
>>> This is possible, but wouldn't it require two separate paths through the
>> IF
>>> layer, and would it not work for non-PDF output?
>>
>> I don’t think so. The original text should be passed through anyway to
>> create the ToUnicode cmap.
>
>
> Why?

For copy/pasting to work in PDF. The original text must be returned.
This is also important for accessibility (reading the text aloud).

I've already agreed that having the original text is important, and its not there at present, so this is a bug waiting to be solved.
 


>> So PDF can use the glyph mapping to generate
>> the text operators and the original text for the ToUnicode cmap. The IF
>> renderer just streams out the original text. And the other renderers
>> just deal with the glyph mapping.
>>
>
> Since the technique I suggests will work and does not require this, then
> this (repeating the character to glyph mapping, positioning, and layout
> process) isn't necessary. I have agreed, however, that embedding the
> original UC text for performing copy and find operations will be useful,
> for which there is already an open bug [1].
>
> [1] https://issues.apache.org/jira/browse/FOP-2204
>

To summarize:

(1) I agree it is desirable to include the original unicode text so that copy/find/accessibility can work;
(2) I agree that it is possible to re-perform the character/glyph mapping process provided new attributes are added to the IF text element;
(3) I am not (yet) convinced in the wisdom of supporting modification to the IF text, but I'm open to learn about use cases;
(4) I know that it is possible to satisfy (1) above without having to re-perform the mapping;
(5) I know that the present problem can be solved without doing (1) or (2), e.g., by adding pua children to the font element;
Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Alexios Giotis
On 26 Apr 2013, at 23:45, Glenn Adams <[hidden email]> wrote:

> (3) I am not (yet) convinced in the wisdom of supporting modification to the IF text, but I'm open to learn about use cases;
>



Hi Glenn,

Interesting thread, I will just attempt to describe some use cases where I need to modify the IF text.

1. Printing jobs
This is selecting documents, grouping / sorting them (e.g. group per range of pages so they fit in a certain envelope type and then sort by zip code) and then splitting them in to batches of about 20000 pages each. This is done by first rendering each document to IF and then concatenating them to the final output format. I need to first create the IF because:

- The number of pages of each document is not known in advance (e.g. from the XSL:FO) and this is an important criterion for creating batches.
- It is not efficient (or possible) to render documents of 20k or more pages.

During rendering the IF to the final output format there are some SAX filters installed after XMLReader and before FOP that on the fly modify the IF. This is typically needed to:
- Add page / sheet / document  and other counters across each printing batch and for the whole printing job.
- Adding barcodes / OMRs or other symbols that drive the inserter (enveloping machine).



2. Fast rendering of documents
Rendering is a resource intensive process and we need to serve documents fastly regardless of their size. What we do it to 'cache' IF. A user selects a document and then based on her permissions and the parts she selected, we render part of the cached IF to PDF (no other formats in that case). But there are some parts that we need to change, the most common being the total number of pages (e.g. the N in Page 1 of N). We change it by either replacing a text placeholder with the actual value or by overlaying each cached IF page with a short, dynamically generated one. The first approach is faster but not optimal if we assumed one or two digits for N but it is a four digit number.



3. Rendering really big documents
There is customer (of one of our customers) that has a monthly invoice of 60k pages and he gets that printed. FOP can't render such big documents with a single pass and we need to modify partial IFs.


We do have other use cases but I hope I described some of them.


Alexios Giotis



Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Assigned] (FOP-2210) [PATCH] Complex script IF to output missing glyphs

Chris Bowditch
Hi Glenn, Alexios,

One of the key requirements when we implemented IF XML, was the ability
to make modifications. Thanks to Alex for provide a list of business
reasons why that is necessary. I agree with those use cases. Some of the
others are adding barcodes, OMR marks, large file page numbering.

We want to see this original requirement for IF XML maintained.

Thanks,

Chris

On 28/04/2013 20:15, Alexios Giotis wrote:

> On 26 Apr 2013, at 23:45, Glenn Adams <[hidden email]> wrote:
>
>> (3) I am not (yet) convinced in the wisdom of supporting modification to the IF text, but I'm open to learn about use cases;
>>
>
>
> Hi Glenn,
>
> Interesting thread, I will just attempt to describe some use cases where I need to modify the IF text.
>
> 1. Printing jobs
> This is selecting documents, grouping / sorting them (e.g. group per range of pages so they fit in a certain envelope type and then sort by zip code) and then splitting them in to batches of about 20000 pages each. This is done by first rendering each document to IF and then concatenating them to the final output format. I need to first create the IF because:
>
> - The number of pages of each document is not known in advance (e.g. from the XSL:FO) and this is an important criterion for creating batches.
> - It is not efficient (or possible) to render documents of 20k or more pages.
>
> During rendering the IF to the final output format there are some SAX filters installed after XMLReader and before FOP that on the fly modify the IF. This is typically needed to:
> - Add page / sheet / document  and other counters across each printing batch and for the whole printing job.
> - Adding barcodes / OMRs or other symbols that drive the inserter (enveloping machine).
>
>
>
> 2. Fast rendering of documents
> Rendering is a resource intensive process and we need to serve documents fastly regardless of their size. What we do it to 'cache' IF. A user selects a document and then based on her permissions and the parts she selected, we render part of the cached IF to PDF (no other formats in that case). But there are some parts that we need to change, the most common being the total number of pages (e.g. the N in Page 1 of N). We change it by either replacing a text placeholder with the actual value or by overlaying each cached IF page with a short, dynamically generated one. The first approach is faster but not optimal if we assumed one or two digits for N but it is a four digit number.
>
>
>
> 3. Rendering really big documents
> There is customer (of one of our customers) that has a monthly invoice of 60k pages and he gets that printed. FOP can't render such big documents with a single pass and we need to modify partial IFs.
>
>
> We do have other use cases but I hope I described some of them.
>
>
> Alexios Giotis
>
>
>
>
>

12