- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 114
🚧 Colour based tagging & non-ocr page_to_text #111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Open
      
      
            seanmcguire12
  wants to merge
  48
  commits into
  main
  
    
      
        
          
  
    
      Choose a base branch
      
     
    
      
        
      
      
        
          
          
        
        
          
            
              
              
              
  
           
        
        
          
            
              
              
           
        
       
     
  
        
          
            
          
            
          
        
       
    
      
from
APE-76
  
      
      
   
  
    
  
  
  
 
  
      
    base: main
Could not load branches
            
              
  
    Branch not found: {{ refName }}
  
            
                
      Loading
              
            Could not load tags
            
            
              Nothing to show
            
              
  
            
                
      Loading
              
            Are you sure you want to change the base?
            Some commits from the old base branch may be removed from the timeline,
            and old review comments may become outdated.
          
          
  
     Open
                    Changes from 23 commits
      Commits
    
    
            Show all changes
          
          
            48 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      9ad4726
              
                ✨ Added functionality for taking screenshot of original/raw page prio…
              
              
                seanmcguire12 a9ee07f
              
                ✨ Reformatted core.py using ruff. Updated signature of page_to_image …
              
              
                seanmcguire12 8ce819c
              
                🔨 fix: specified correct type for the combined OCR annotations.
              
              
                seanmcguire12 81229ab
              
                ♻️ Updated naming for _hide_non_tag_element(), moved logic for two ph…
              
              
                seanmcguire12 b1611b3
              
                Merge pull request #95 from reworkd/APE-75
              
              
                seanmcguire12 5dd3391
              
                Merge remote-tracking branch 'origin/main' into API-33
              
              
                seanmcguire12 b4ad5e9
              
                🔧 Removed "words" key from ImageAnnotatorResponse in core.py.
              
              
                seanmcguire12 c597504
              
                Merge remote-tracking branch 'origin/main' into APE-76
              
              
                seanmcguire12 0ee8f82
              
                WIP: implemented colour based tagging & page_to_text_new which doesnt…
              
              
                seanmcguire12 03f05f2
              
                Fix: google creds can be loaded as JSON. Added bananalyzer download t…
              
              
                seanmcguire12 df5cd7e
              
                Fixed sticky/fixed element issue. Added data type for coloured elements.
              
              
                seanmcguire12 5812ff5
              
                Fix: added filtering functionality on getElementBoundingBoxes functio…
              
              
                seanmcguire12 9c64fd1
              
                implemented sorting before combining annotations to reduce spacing & …
              
              
                seanmcguire12 384c55f
              
                - Added functions for recolouring elements so they can be found with …
              
              
                seanmcguire12 81025fb
              
                Improved color tagging:
              
              
                seanmcguire12 f7bceab
              
                added text child tagging
              
              
                seanmcguire12 f51a353
              
                Merge branch 'refs/heads/main' into APE-76
              
              
                 943a24c
              
                WIP: still need to fix tests and delete debug code
              
              
                seanmcguire12 f6ba5be
              
                Merge branch 'main' into APE-76
              
              
                seanmcguire12 31a1da3
              
                Fixed missing leaf text issue
              
              
                seanmcguire12 a1eedd7
              
                Merge remote-tracking branch 'refs/remotes/origin/main' into APE-76
              
              
                seanmcguire12 385e9c5
              
                📈 Rm debug code, added !important for span styles
              
              
                seanmcguire12 84a1ecd
              
                📈 Rm transformXpath fn
              
              
                seanmcguire12 a49c673
              
                ✏️ Avoid tagging separator symbols
              
              
                asim-shrestha 4b8015f
              
                Merge branch 'main' into APE-76
              
              
                seanmcguire12 e644419
              
                Fix: include tags for buttons & icons that don't have text
              
              
                seanmcguire12 dab4508
              
                use words.length instead of boundingBoxes to determine if we should r…
              
              
                seanmcguire12 03a386d
              
                add spacing between tag characters. eg: [$1] becomes [ $ 1 ]
              
              
                seanmcguire12 4016151
              
                fix: make sure checkboxes are coloured
              
              
                seanmcguire12 94d1b75
              
                include placeholder text
              
              
                seanmcguire12 93fd680
              
                reduce height of bounding boxes to mitigate excesive use of ** in tex…
              
              
                seanmcguire12 3d92214
              
                update bounding box width to accommodate extra spacing inside tarsier…
              
              
                seanmcguire12 fd1c30e
              
                Merge branch 'main' into APE-76
              
              
                seanmcguire12 362c85b
              
                Merge branch 'main' into APE-76
              
              
                seanmcguire12 b46f249
              
                added tagless functionality for colour tagging
              
              
                seanmcguire12 46dad31
              
                fix: make sure tag_to_xpath returns xpaths of all coloured elements, …
              
              
                seanmcguire12 d98989a
              
                Merge branch 'main' into APE-76
              
              
                seanmcguire12 7e2bf7a
              
                get the first option of dropdown text if there is no default selected…
              
              
                seanmcguire12 5b8c4ff
              
                Merge branch 'main' into APE-76
              
              
                seanmcguire12 3e0b294
              
                added functionality to revert webpage after colour tagging. changed r…
              
              
                seanmcguire12 0d643e2
              
                Merge branch 'main' into APE-76
              
              
                seanmcguire12 960d686
              
                Merge branch 'main' into APE-76
              
              
                seanmcguire12 917057c
              
                refactor colour tagging
              
              
                seanmcguire12 4c5edf4
              
                more refactoring, store/restore DOM instead of using revert functions…
              
              
                seanmcguire12 0e56205
              
                reformat
              
              
                seanmcguire12 06f1eaf
              
                Merge branch 'main' into APE-76
              
              
                seanmcguire12 54122e1
              
                update lock
              
              
                seanmcguire12 ed35707
              
                prettier fix
              
              
                seanmcguire12 File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
          The table of contents is too big for display.
        
      Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
      
      Oops, something went wrong.
      
    
  
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
        
          
          
            171 changes: 82 additions & 89 deletions
          
          171 
        
  tarsier-snapshots/snapshots/05W3ZEmj8pbuYSHArYUkz/ocr.txt
  
  
      
      
   
        
      
      
    Large diffs are not rendered by default.
      
      Oops, something went wrong.
      
    
  
        
          
          Binary file modified
          
            BIN
              
                -6.72 KB
                  (99%)
              
          
        
  tarsier-snapshots/snapshots/05W3ZEmj8pbuYSHArYUkz/screenshot.png
  
  
      
      
   
        
      
      
    
      
      Loading
      
  Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
    
        
          
          
            440 changes: 207 additions & 233 deletions
          
          440 
        
  tarsier-snapshots/snapshots/07wOwFaw3aGekjCBpZkg0/ocr.txt
  
  
      
      
   
        
      
      
    Large diffs are not rendered by default.
      
      Oops, something went wrong.
      
    
  
        
          
          Binary file modified
          
            BIN
              
                +3.43 KB
                  (100%)
              
          
        
  tarsier-snapshots/snapshots/07wOwFaw3aGekjCBpZkg0/screenshot.png
  
  
      
      
   
        
      
      
    
      
      Loading
      
  Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
    
        
          
          
            77 changes: 37 additions & 40 deletions
          
          77 
        
  tarsier-snapshots/snapshots/0fdyKSMbc3kVUgL9RGiEk/ocr.txt
  
  
      
      
   
        
      
      
    
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -1,64 +1,61 @@ | ||
| --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ||
| [ 0 ] O'MELVENY WORLDWIDE | ||
| [ @ 1 ] | ||
| **' Melveny** [ 2 ] PROFESSIONALS [ 3 ] SERVICES [ 4 ] INSIGHTS [ 5 ] NEWS [ 6 ] LOCATIONS [ 7 ] ABOUT [ @ 8 ] CAREERS [ @ 9 ] ALUMNI ☐ | ||
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ||
| [0] O Melveny Worldwide | ||
| [2] Professionals [3] Services [4] Insights [5] News [6] Locations [7] About [@8] Careers [@9] Alumni | ||
|  | ||
| [ @ 10 ] Our Team > [ 11 ] Ryan Coombs | ||
| [ 20 ] AREAS OF FOCUS | ||
| [ @ 21 ] Bank Finance | ||
| [ @ 22 ] Capital Markets | ||
| [ @ 23 ] Emerging Companies | ||
| [ @ 24 ] Private Equity | ||
| [ @ 25 ] Public Company Advisory | ||
| [@10] Our Team [11] Ryan Coombs | ||
| [20] Areas of Focus | ||
|  | ||
| [@21] Bank Finance | ||
| [@22] Capital Markets | ||
| [@23] Emerging Companies | ||
| [@24] Private Equity | ||
| [@25] Public Company Advisory | ||
|  | ||
|  | ||
|  | ||
|  | ||
|  | ||
|  | ||
| [37] O Melveny uses cookies to improve website functionality | ||
| and performance. By continuing to this website, you are | ||
| agreeing to our Cookie Policy. | ||
| [$39] Only use essential cookies [$40] Accept Cookies | ||
|  | ||
|  | ||
| **[12] Ryan Coombs** [@14] San Francisco Office | ||
| [13] Partner | ||
|  | ||
| [@15] rcoombs omm.com [16] D: +1-415-984-8943 | ||
|  | ||
| [@26] Overview [@27] News [@28] Credentials | ||
|  | ||
| [ 12 ] Ryan Coombs [ @ 14 ] San Francisco Office | ||
| [ 13 ] Partner | ||
| **[29] Ryan Coombs structures and executes a broad range of capital** | ||
| **markets transactions for issuers and investment banks, including initial** | ||
| **public offerings and other common and preferred equity offerings,** | ||
| **investment grade and high-yield debt issuances, convertible notes** | ||
| **offerings, SPAC transactions, PIPEs, and other complex capital-raising** | ||
| **transactions. Clients turn to Ryan for counsel on public company** | ||
| **reporting, corporate governance and other corporate matters.** | ||
| [30] Ryan s experience spans a variety of sectors, with an emphasis on technology, and includes social media, | ||
| consumer electronics, and software, as well as entertainment, life sciences, healthcare and renewable energy. | ||
|  | ||
| [ @ 15 ] [email protected] [ 16 ] D: [ @ 17 ] + 1-415-984-8943 [ @ 18 ] [ @ 19 ] | ||
|  | ||
| [ @ 26 ] OVERVIEW [ @ 27 ] NEWS [ @ 28 ] CREDENTIALS | ||
| [31] Related Practices | ||
|  | ||
| [@32] Bank Finance | ||
|  | ||
| **[ 29 ] Ryan Coombs structures and executes a broad range of capital** | ||
| markets transactions for issuers and investment banks, including initial | ||
| public offerings and other common and preferred equity offerings, | ||
| investment grade and high - yield debt issuances, convertible notes | ||
| offerings, SPAC transactions, PIPES, and other complex capital - raising | ||
| **transactions. Clients turn to Ryan for counsel on public company** | ||
| **reporting, corporate governance and other corporate matters.** | ||
| [ 30 ] Ryan's experience spans a variety of sectors, with an emphasis on technology, and includes social media, consumer | ||
| electronics, and software, as well as entertainment, life sciences, healthcare and renewable energy. | ||
| [@33] Capital Markets | ||
|  | ||
| [@34] Emerging Companies | ||
|  | ||
| [ 31 ] RELATED PRACTICES | ||
| [ @ 32 ] Bank Finance | ||
| [@35] Private Equity | ||
|  | ||
| [ @ 33 ] Capital Markets | ||
| [@36] Public Company Advisory | ||
|  | ||
| [ @ 34 ] Emerging Companies | ||
|  | ||
| [ @ 35 ] Private Equity | ||
|  | ||
| [ @ 36 ] Public Company Advisory. | ||
|  | ||
|  | ||
| [ 37 ] O'Melveny uses cookies to improve website functionality | ||
| and performance. By continuing to this website, you are | ||
| agreeing to our [ @ 39 ] Cookie Policy [ 38 ]. | ||
| [ @ 42 ] | ||
| [ 48 ] O'Melveny's latest insights, straight to your inbox [ @ 49 ] Subscribe [ @ 43 ] [ @ 44 ] Only [ @ 45 ] tial Co [ @ 46 ] [ @ 47 ] .ccept Cookies | ||
| O'Melveny | ||
| [ @ 51 ] DISCLAIMER [ @ 52 ] PRIVACY POLICY [ @ 53 ] CONTACT US [ 50 ] ATTORNEY ADVERTISING © 2023 O'MELVENY & MYERS LLP. ALL RIGHTS RESERVED | ||
| --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ||
| Token count: 638 | ||
| [47] O Melveny s latest insights, straight to your inbox [@48] Subscribe | ||
| [@50] Disclaimer [@51] Privacy Policy [@52] Contact Us [49] Attorney Advertising 2023 O'Melveny & Myers LLP. All rights reserved | ||
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ||
| Token count: 483 | 
        
          
          Binary file modified
          
            BIN
              
                -3.68 KB
                  (99%)
              
          
        
  tarsier-snapshots/snapshots/0fdyKSMbc3kVUgL9RGiEk/screenshot.png
  
  
      
      
   
        
      
      
    
      
      Loading
      
  Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
    
      
      Oops, something went wrong.
        
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really relevant for settuing up tarsier. Would delete