Skip to content
Open
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
9ad4726
✨ Added functionality for taking screenshot of original/raw page prio…
seanmcguire12 Jun 26, 2024
a9ee07f
✨ Reformatted core.py using ruff. Updated signature of page_to_image …
seanmcguire12 Jun 26, 2024
8ce819c
🔨 fix: specified correct type for the combined OCR annotations.
seanmcguire12 Jun 26, 2024
81229ab
♻️ Updated naming for _hide_non_tag_element(), moved logic for two ph…
seanmcguire12 Jun 26, 2024
b1611b3
Merge pull request #95 from reworkd/APE-75
seanmcguire12 Jun 26, 2024
5dd3391
Merge remote-tracking branch 'origin/main' into API-33
seanmcguire12 Jul 3, 2024
b4ad5e9
🔧 Removed "words" key from ImageAnnotatorResponse in core.py.
seanmcguire12 Jul 4, 2024
c597504
Merge remote-tracking branch 'origin/main' into APE-76
seanmcguire12 Jul 18, 2024
0ee8f82
WIP: implemented colour based tagging & page_to_text_new which doesnt…
seanmcguire12 Jul 18, 2024
03f05f2
Fix: google creds can be loaded as JSON. Added bananalyzer download t…
seanmcguire12 Jul 22, 2024
df5cd7e
Fixed sticky/fixed element issue. Added data type for coloured elements.
seanmcguire12 Jul 24, 2024
5812ff5
Fix: added filtering functionality on getElementBoundingBoxes functio…
seanmcguire12 Jul 24, 2024
9c64fd1
implemented sorting before combining annotations to reduce spacing & …
seanmcguire12 Jul 25, 2024
384c55f
- Added functions for recolouring elements so they can be found with …
seanmcguire12 Aug 1, 2024
81025fb
Improved color tagging:
seanmcguire12 Aug 1, 2024
f7bceab
added text child tagging
seanmcguire12 Aug 10, 2024
f51a353
Merge branch 'refs/heads/main' into APE-76
Aug 14, 2024
943a24c
WIP: still need to fix tests and delete debug code
seanmcguire12 Aug 14, 2024
f6ba5be
Merge branch 'main' into APE-76
seanmcguire12 Aug 15, 2024
31a1da3
Fixed missing leaf text issue
seanmcguire12 Aug 19, 2024
a1eedd7
Merge remote-tracking branch 'refs/remotes/origin/main' into APE-76
seanmcguire12 Aug 22, 2024
385e9c5
📈 Rm debug code, added !important for span styles
seanmcguire12 Aug 22, 2024
84a1ecd
📈 Rm transformXpath fn
seanmcguire12 Aug 23, 2024
a49c673
✏️ Avoid tagging separator symbols
asim-shrestha Aug 23, 2024
4b8015f
Merge branch 'main' into APE-76
seanmcguire12 Aug 29, 2024
e644419
Fix: include tags for buttons & icons that don't have text
seanmcguire12 Aug 29, 2024
dab4508
use words.length instead of boundingBoxes to determine if we should r…
seanmcguire12 Aug 29, 2024
03a386d
add spacing between tag characters. eg: [$1] becomes [ $ 1 ]
seanmcguire12 Aug 29, 2024
4016151
fix: make sure checkboxes are coloured
seanmcguire12 Aug 30, 2024
94d1b75
include placeholder text
seanmcguire12 Aug 30, 2024
93fd680
reduce height of bounding boxes to mitigate excesive use of ** in tex…
seanmcguire12 Aug 30, 2024
3d92214
update bounding box width to accommodate extra spacing inside tarsier…
seanmcguire12 Aug 30, 2024
fd1c30e
Merge branch 'main' into APE-76
seanmcguire12 Aug 30, 2024
362c85b
Merge branch 'main' into APE-76
seanmcguire12 Aug 30, 2024
b46f249
added tagless functionality for colour tagging
seanmcguire12 Aug 31, 2024
46dad31
fix: make sure tag_to_xpath returns xpaths of all coloured elements, …
seanmcguire12 Aug 31, 2024
d98989a
Merge branch 'main' into APE-76
seanmcguire12 Aug 31, 2024
7e2bf7a
get the first option of dropdown text if there is no default selected…
seanmcguire12 Aug 31, 2024
5b8c4ff
Merge branch 'main' into APE-76
seanmcguire12 Sep 1, 2024
3e0b294
added functionality to revert webpage after colour tagging. changed r…
seanmcguire12 Sep 1, 2024
0d643e2
Merge branch 'main' into APE-76
seanmcguire12 Sep 26, 2024
960d686
Merge branch 'main' into APE-76
seanmcguire12 Sep 27, 2024
917057c
refactor colour tagging
seanmcguire12 Sep 29, 2024
4c5edf4
more refactoring, store/restore DOM instead of using revert functions…
seanmcguire12 Oct 3, 2024
0e56205
reformat
seanmcguire12 Oct 4, 2024
06f1eaf
Merge branch 'main' into APE-76
seanmcguire12 Oct 4, 2024
54122e1
update lock
seanmcguire12 Oct 4, 2024
ed35707
prettier fix
seanmcguire12 Oct 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
674 changes: 430 additions & 244 deletions poetry.lock

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ playwright = "^1.44.0"
selenium = "^4.21.0"
google-cloud-vision = "^3.7.2"
azure-ai-vision-imageanalysis = "^1.0.0b2"
pillow = "^10.4.0"
numpy = "^2.0.1"


[tool.poetry.group.dev.dependencies]
Expand Down
6 changes: 5 additions & 1 deletion scripts/setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,8 @@ cd ..
npm install
npm run build

poetry install
poetry install

cd ./tarsier-snapshots || exit 1
poetry install
poetry run bananalyze --download
Comment on lines +9 to +12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really relevant for settuing up tarsier. Would delete

171 changes: 82 additions & 89 deletions tarsier-snapshots/snapshots/05W3ZEmj8pbuYSHArYUkz/ocr.txt

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
440 changes: 207 additions & 233 deletions tarsier-snapshots/snapshots/07wOwFaw3aGekjCBpZkg0/ocr.txt

Large diffs are not rendered by default.

Binary file modified tarsier-snapshots/snapshots/07wOwFaw3aGekjCBpZkg0/screenshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
77 changes: 37 additions & 40 deletions tarsier-snapshots/snapshots/0fdyKSMbc3kVUgL9RGiEk/ocr.txt
Original file line number Diff line number Diff line change
@@ -1,64 +1,61 @@
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
[ 0 ] O'MELVENY WORLDWIDE
[ @ 1 ]
**' Melveny** [ 2 ] PROFESSIONALS [ 3 ] SERVICES [ 4 ] INSIGHTS [ 5 ] NEWS [ 6 ] LOCATIONS [ 7 ] ABOUT [ @ 8 ] CAREERS [ @ 9 ] ALUMNI ☐
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[0] O Melveny Worldwide
[2] Professionals [3] Services [4] Insights [5] News [6] Locations [7] About [@8] Careers [@9] Alumni

[ @ 10 ] Our Team > [ 11 ] Ryan Coombs
[ 20 ] AREAS OF FOCUS
[ @ 21 ] Bank Finance
[ @ 22 ] Capital Markets
[ @ 23 ] Emerging Companies
[ @ 24 ] Private Equity
[ @ 25 ] Public Company Advisory
[@10] Our Team [11] Ryan Coombs
[20] Areas of Focus

[@21] Bank Finance
[@22] Capital Markets
[@23] Emerging Companies
[@24] Private Equity
[@25] Public Company Advisory






[37] O Melveny uses cookies to improve website functionality
and performance. By continuing to this website, you are
agreeing to our Cookie Policy.
[$39] Only use essential cookies [$40] Accept Cookies


**[12] Ryan Coombs** [@14] San Francisco Office
[13] Partner

[@15] rcoombs omm.com [16] D: +1-415-984-8943

[@26] Overview [@27] News [@28] Credentials

[ 12 ] Ryan Coombs [ @ 14 ] San Francisco Office
[ 13 ] Partner
**[29] Ryan Coombs structures and executes a broad range of capital**
**markets transactions for issuers and investment banks, including initial**
**public offerings and other common and preferred equity offerings,**
**investment grade and high-yield debt issuances, convertible notes**
**offerings, SPAC transactions, PIPEs, and other complex capital-raising**
**transactions. Clients turn to Ryan for counsel on public company**
**reporting, corporate governance and other corporate matters.**
[30] Ryan s experience spans a variety of sectors, with an emphasis on technology, and includes social media,
consumer electronics, and software, as well as entertainment, life sciences, healthcare and renewable energy.

[ @ 15 ] [email protected] [ 16 ] D: [ @ 17 ] + 1-415-984-8943 [ @ 18 ] [ @ 19 ]

[ @ 26 ] OVERVIEW [ @ 27 ] NEWS [ @ 28 ] CREDENTIALS
[31] Related Practices

[@32] Bank Finance

**[ 29 ] Ryan Coombs structures and executes a broad range of capital**
markets transactions for issuers and investment banks, including initial
public offerings and other common and preferred equity offerings,
investment grade and high - yield debt issuances, convertible notes
offerings, SPAC transactions, PIPES, and other complex capital - raising
**transactions. Clients turn to Ryan for counsel on public company**
**reporting, corporate governance and other corporate matters.**
[ 30 ] Ryan's experience spans a variety of sectors, with an emphasis on technology, and includes social media, consumer
electronics, and software, as well as entertainment, life sciences, healthcare and renewable energy.
[@33] Capital Markets

[@34] Emerging Companies

[ 31 ] RELATED PRACTICES
[ @ 32 ] Bank Finance
[@35] Private Equity

[ @ 33 ] Capital Markets
[@36] Public Company Advisory

[ @ 34 ] Emerging Companies

[ @ 35 ] Private Equity

[ @ 36 ] Public Company Advisory.


[ 37 ] O'Melveny uses cookies to improve website functionality
and performance. By continuing to this website, you are
agreeing to our [ @ 39 ] Cookie Policy [ 38 ].
[ @ 42 ]
[ 48 ] O'Melveny's latest insights, straight to your inbox [ @ 49 ] Subscribe [ @ 43 ] [ @ 44 ] Only [ @ 45 ] tial Co [ @ 46 ] [ @ 47 ] .ccept Cookies
O'Melveny
[ @ 51 ] DISCLAIMER [ @ 52 ] PRIVACY POLICY [ @ 53 ] CONTACT US [ 50 ] ATTORNEY ADVERTISING © 2023 O'MELVENY & MYERS LLP. ALL RIGHTS RESERVED
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Token count: 638
[47] O Melveny s latest insights, straight to your inbox [@48] Subscribe
[@50] Disclaimer [@51] Privacy Policy [@52] Contact Us [49] Attorney Advertising 2023 O'Melveny & Myers LLP. All rights reserved
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Token count: 483
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading