Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
211 commits
Select commit Hold shift + click to select a range
b59bd87
Fix S3 region setting in duckdb connection
gabrielsntr Sep 9, 2025
a046306
Merge branch 'mindsdb:main' into main
gabrielbressan-tfy Sep 10, 2025
01d8441
Merge branch 'mindsdb:main' into main
gabrielbressan-tfy Sep 17, 2025
7381e64
Update s3_handler.py
CarvalhoRod Sep 18, 2025
0692c7a
Update connection_args.py
CarvalhoRod Sep 18, 2025
3646fe4
Merge pull request #5 from Talentify/feature/ch74022/aws-key-optional
CarvalhoRod Sep 18, 2025
a956cb3
Update s3_handler.py
CarvalhoRod Sep 18, 2025
0fba973
Merge pull request #6 from Talentify/feature/ch74022/fix-aws-key-opti…
CarvalhoRod Sep 18, 2025
96a6363
Update s3_handler.py
CarvalhoRod Sep 18, 2025
8dcf9a9
update deploy process
Sep 23, 2025
a75aea4
update deploy process
Sep 23, 2025
7f18322
update deploy process
Sep 23, 2025
453eab1
Merge branch 'mindsdb:main' into main
gabrielbressan-tfy Sep 25, 2025
6ee04b3
Merge pull request #7 from Talentify/feature/ch74022/fix-duck-aws-key…
gabrielbressan-tfy Sep 25, 2025
f054e49
feat(gmail_handler): add OAuth parameters and enhance connection logic
gabrielbressan-tfy Sep 25, 2025
7d01884
Merge branch 'main' of https://github.com/Talentify/mindsdb
gabrielbressan-tfy Sep 25, 2025
fd95024
Merge branch 'mindsdb:main' into main
gabrielbressan-tfy Sep 29, 2025
39128f5
feat(s3_handler): enhance AWS S3 connection handling with session man…
gabrielbressan-tfy Sep 29, 2025
140cc75
Merge branch 'main' of https://github.com/Talentify/mindsdb
gabrielbressan-tfy Sep 29, 2025
5597e26
feat(default_handlers): add additional default handlers for various s…
gabrielbressan-tfy Sep 29, 2025
a792a49
Enhance MS OneDrive integration with improved token management and ne…
gabrielbressan-tfy Oct 5, 2025
10c1b61
Merge branch 'mindsdb:main' into main
gabrielbressan-tfy Oct 6, 2025
7d2d01b
Add AWS S3 Vectors handler implementation with connection parameters …
gabrielbressan-tfy Oct 7, 2025
5d165b4
Add s3vectors to default handlers list
gabrielbressan-tfy Oct 7, 2025
0ff211d
Remove Hacktoberfest README files for main and use cases
gabrielbressan-tfy Oct 7, 2025
679e464
Enhance LiteLLMHandler to batch process embeddings, avoiding API limits
gabrielbressan-tfy Oct 8, 2025
625c751
Add support for additional file formats in S3Handler and FileReader
gabrielbressan-tfy Oct 9, 2025
29142a8
Refactor S3Handler and S3VectorsHandler to support AWS_PROFILE for se…
gabrielbressan-tfy Oct 9, 2025
bf5b5a3
Add metadata limits handling and configuration for vector databases
gabrielbressan-tfy Oct 10, 2025
48cee3a
Batch process GetVectors requests in S3VectorsHandler to comply with …
gabrielbressan-tfy Oct 10, 2025
fa3607f
Enhance data insertion methods to return DataFrames with processed ID…
gabrielbressan-tfy Oct 10, 2025
a4631bf
Validate key lengths for S3 Vectors operations and implement batch de…
gabrielbressan-tfy Oct 13, 2025
1f44e05
Add chunking support for text file reading in FileReader and S3Handler
gabrielbressan-tfy Oct 14, 2025
c8b2e90
Update default chunk size and overlap in FileReader; enhance text spl…
gabrielbressan-tfy Oct 17, 2025
f79258f
Enhance MSGraphAPIOneDriveClient and MSOneDriveHandler to support fil…
gabrielbressan-tfy Oct 23, 2025
86d972a
Enhance FileTable to retrieve item_id and drive_id for SharePoint fil…
gabrielbressan-tfy Oct 24, 2025
b5cf0e3
Enhance MSOneDriveHandler and FileTable to support file ID retrieval;…
gabrielbressan-tfy Oct 24, 2025
dab2e16
Merge pull request #13 from Talentify/enable-one-drive-file-picker-ca…
gabrielbressan-tfy Oct 24, 2025
b246f38
Add support for PPT and PPTX file formats in S3Handler and FileReader…
gabrielbressan-tfy Oct 27, 2025
9350094
Add Xero integration handler and associated tables
gabrielbressan-tfy Oct 30, 2025
d37c317
Enhance XeroHandler to support token injection for OAuth2 authenticat…
gabrielbressan-tfy Oct 31, 2025
09f303e
Enhance XeroHandler token management; add client credentials validati…
gabrielbressan-tfy Oct 31, 2025
e977e6b
Enhance XeroTable classes to support optimized API queries; implement…
gabrielbressan-tfy Oct 31, 2025
3b2c9c2
Enhance XeroHandler with race condition protection for token refresh;…
gabrielbressan-tfy Nov 3, 2025
aa65dad
Refactor XeroHandler to register AccountsTable correctly; remove dupl…
gabrielbressan-tfy Nov 3, 2025
b770363
Increase default result limit in SELECTQueryParser from 20 to 1000 fo…
gabrielbressan-tfy Nov 3, 2025
693b8ef
Enhance QuotesTable to support additional filters and update paramete…
gabrielbressan-tfy Nov 3, 2025
188c2ab
Enhance XeroTable to support BETWEEN operator in condition parsing; u…
gabrielbressan-tfy Nov 3, 2025
251d275
Add AccountsTable, BankTransactionsTable, and QuotesTable implementat…
gabrielbressan-tfy Nov 3, 2025
1f71d1d
Refactor XeroHandler to register new tables (BudgetsTable, ContactGro…
gabrielbressan-tfy Nov 3, 2025
7aaec9d
Add CreditNotesTable implementation and register it in XeroHandler; r…
gabrielbressan-tfy Nov 4, 2025
3b0d93b
Add InvoicesTable and ItemsTable implementations; update ContactsTabl…
gabrielbressan-tfy Nov 4, 2025
b710cc9
Add JournalsTable and ManualJournalsTable implementations; register t…
gabrielbressan-tfy Nov 4, 2025
6cd603e
Add OrganisationsTable implementation and register it in XeroHandler …
gabrielbressan-tfy Nov 4, 2025
c3e1165
Add OverpaymentsTable implementation and register it in XeroHandler f…
gabrielbressan-tfy Nov 4, 2025
a3a2c20
Refactor InvoicesTable and OverpaymentsTable to remove debug print st…
gabrielbressan-tfy Nov 4, 2025
c1b043a
Add PaymentServicesTable implementation for handling payment services…
gabrielbressan-tfy Nov 4, 2025
d649895
Add contact_id support to CreditNotesTable, OverpaymentsTable, Paymen…
gabrielbressan-tfy Nov 4, 2025
107b7e7
Implement custom JSON serialization in XeroTable for Decimal, datetim…
gabrielbressan-tfy Nov 4, 2025
5edd256
Add RepeatingInvoicesTable implementation and register it in XeroHand…
gabrielbressan-tfy Nov 4, 2025
4def82b
Implement pagination and result limit parsing for Xero API tables to …
gabrielbressan-tfy Nov 4, 2025
2d0db86
Increase page size to 1000 for PaymentsTable to enhance data retrieva…
gabrielbressan-tfy Nov 4, 2025
2adeaa1
Add report table implementations for Xero API: BalanceSheet, BankSumm…
gabrielbressan-tfy Nov 5, 2025
1418298
Enhance SQL condition handling by unwrapping TypeCast in extract_comp…
gabrielbressan-tfy Nov 5, 2025
b905b26
Add snake_case conversion for keys in XeroTable class to improve cons…
gabrielbressan-tfy Nov 5, 2025
5147614
Set default date parameters for BankSummary and ProfitLoss report tab…
gabrielbressan-tfy Nov 5, 2025
00b84bd
Update mindsdb/integrations/handlers/xero_handler/tables/items_table.py
gabrielbressan-tfy Nov 5, 2025
f015ef2
Update mindsdb/integrations/handlers/xero_handler/tables/journals_tab…
gabrielbressan-tfy Nov 5, 2025
3990222
Update mindsdb/integrations/handlers/xero_handler/tables/bank_transac…
gabrielbressan-tfy Nov 5, 2025
38f3789
Update mindsdb/integrations/handlers/xero_handler/tables/bank_transfe…
gabrielbressan-tfy Nov 5, 2025
3f75fb1
Remove obsolete config.json file
gabrielbressan-tfy Nov 5, 2025
79d30dd
Merge branch 'feat/add-xero-connector' of https://github.com/Talentif…
gabrielbressan-tfy Nov 5, 2025
223f236
Update mindsdb/integrations/handlers/xero_handler/xero_handler.py
gabrielbressan-tfy Nov 5, 2025
175abc7
Update mindsdb/integrations/handlers/xero_handler/tables/invoices_tab…
gabrielbressan-tfy Nov 5, 2025
3e14890
Merge branch 'feat/add-xero-connector' of https://github.com/Talentif…
gabrielbressan-tfy Nov 5, 2025
7abebb8
Merge pull request #14 from Talentify/feat/add-xero-connector
gabrielbressan-tfy Nov 5, 2025
1f7f25b
Change resource_id column type from Integer to BigInteger in JsonStor…
gabrielbressan-tfy Nov 6, 2025
0af00a0
Add migration to convert json_storage resource_id column from Integer…
gabrielbressan-tfy Nov 6, 2025
79202c6
Fix down_revision reference in migration to convert json_storage reso…
gabrielbressan-tfy Nov 6, 2025
c63cf5a
Add config.json to .gitignore
gabrielbressan-tfy Nov 7, 2025
942cc96
Implement Google Analytics Data API integration with reporting capabi…
gabrielbressan-tfy Nov 7, 2025
eb6c1a1
Add secure storage for Google Analytics credentials and update README
gabrielbressan-tfy Nov 7, 2025
075b8ed
Enhance Google Analytics handler to support credential storage and im…
gabrielbressan-tfy Nov 8, 2025
3c92e5e
Implement Multi-Format API Handler with support for JSON, XML, and CS…
gabrielbressan-tfy Nov 8, 2025
6c59db1
Add connection arguments for Multi-Format API handler and enhance usa…
gabrielbressan-tfy Nov 8, 2025
bf5b477
Enhance Multi-Format API Handler to support max_content_size paramete…
gabrielbressan-tfy Nov 8, 2025
4519cee
Add debug logging to API and ReportsTable handlers for improved trace…
gabrielbressan-tfy Nov 10, 2025
f4ce331
Refactor Google Analytics handler to implement metadata caching for d…
gabrielbressan-tfy Nov 10, 2025
7d2ab87
Optimize query handling by merging ORDER BY, LIMIT, and OFFSET clause…
gabrielbressan-tfy Nov 10, 2025
e0dd5e3
Enhance ReportsTable and RealtimeReportsTable to support operator-val…
gabrielbressan-tfy Nov 10, 2025
652b6eb
Add _clean_cdata_content function to sanitize CDATA sections in XML p…
gabrielbressan-tfy Nov 11, 2025
7256079
Refactor _clean_cdata_content to include date parsing and enhance con…
gabrielbressan-tfy Nov 11, 2025
e316434
Enhance Google Analytics handler documentation and authentication met…
gabrielbressan-tfy Nov 11, 2025
d8e06b0
Refactor GoogleAnalyticsHandler to remove scopes from OAuth2 credenti…
gabrielbressan-tfy Nov 11, 2025
90ccddc
Reduce Gmail API batch size and add delay to handle rate limits. Impl…
gabrielbressan-tfy Nov 11, 2025
c6dbeb5
Update mindsdb/integrations/handlers/google_analytics_handler/google_…
gabrielbressan-tfy Nov 11, 2025
c0ff041
Merge pull request #15 from Talentify/feature/sc-75221/modificar-inte…
gabrielbressan-tfy Nov 11, 2025
e894cac
Refactor HubSpot integration: separate tables into individual files f…
gabrielbressan-tfy Nov 13, 2025
ecd7965
Enhance HubSpot integration: add properties metadata table and optimi…
gabrielbressan-tfy Nov 17, 2025
fc27d71
Implement HubSpot search functionality for Companies, Contacts, and D…
gabrielbressan-tfy Nov 17, 2025
b452d7b
Add HubSpot integration for CRM tables: Companies, Contacts, Deals, a…
gabrielbressan-tfy Nov 17, 2025
e300481
Update HubSpot API client version to 12.0.0
gabrielbressan-tfy Nov 17, 2025
b64c7d3
Add HubSpot CRM integration for Products, Quotes, Tasks, and Tickets …
gabrielbressan-tfy Nov 17, 2025
9b8ac9c
Enhance HubSpot CRM integration: filter selected columns to include o…
gabrielbressan-tfy Nov 17, 2025
cfcbc2c
Enhance HubSpot integration with rate limiting and retry logic for AP…
gabrielbressan-tfy Nov 17, 2025
49b2aeb
Add HubSpot CRM integration for Owners, Pipelines, and Associations t…
gabrielbressan-tfy Nov 17, 2025
0f6a03c
Add Pipeline Stages table and enhance HubSpot integration with query …
gabrielbressan-tfy Nov 18, 2025
c93b369
Update mindsdb/integrations/handlers/hubspot_handler/tables/crm/deals…
gabrielbressan-tfy Nov 18, 2025
c449d4d
Enhance HubSpot Associations integration: return empty DataFrame with…
gabrielbressan-tfy Nov 18, 2025
1d54441
Merge branch 'feature/sc-75274/melhorar-conector-do-hubspot-no-mindsd…
gabrielbressan-tfy Nov 18, 2025
fa78152
Add method to check for aggregates and GROUP BY in view queries to pr…
gabrielbressan-tfy Nov 19, 2025
39d0e82
Enhance HubSpot handler with OAuth2 support and automatic token refre…
gabrielbressan-tfy Nov 20, 2025
9c089f4
Install and load the httpfs extension in S3Handler for enhanced S3 co…
gabrielbressan-tfy Nov 20, 2025
94c09ff
Merge branch 'main' into feature/sc-75274/melhorar-conector-do-hubspo…
gabrielbressan-tfy Nov 20, 2025
8214b2a
Add client_id and client_secret to refresh token request in HubSpot h…
gabrielbressan-tfy Nov 20, 2025
966c3e0
Add schema validation mode to AssociationsTable for handling view cre…
gabrielbressan-tfy Nov 20, 2025
e43e40e
Merge pull request #16 from Talentify/feature/sc-75274/melhorar-conec…
gabrielbressan-tfy Nov 20, 2025
085c9fc
Enhance BigQueryHandler with table filtering capabilities
gabrielbressan-tfy Dec 4, 2025
7f3aad4
Merge pull request #23 from Talentify/feature/sc-75576/melhorar-conec…
gabrielbressan-tfy Dec 8, 2025
4ec72ec
Fix filter expression usage in ReportsTable and RealtimeReportsTable
gabrielbressan-tfy Dec 9, 2025
12b5385
Merge pull request #24 from Talentify/bug/sc-75586/fix-google-analyti…
gabrielbressan-tfy Dec 9, 2025
151cde4
Fix handling of IN/NOT IN clauses in filter_dataframe function
gabrielbressan-tfy Jan 15, 2026
17ef964
Update mindsdb/integrations/utilities/sql_utils.py
gabrielbressan-tfy Jan 15, 2026
8c3c537
Merge pull request #30 from Talentify/fix-enclosing-quotes-in-argument
gabrielbressan-tfy Jan 15, 2026
ab73299
Fix handling of field parts in order by clauses to prevent attribute …
gabrielbressan-tfy Jan 19, 2026
da5b6f8
Merge pull request #32 from Talentify/bug/sc-76015/erro-na-query-do-m…
gabrielbressan-tfy Jan 19, 2026
0d8be10
Add support for Google Gen AI in reranker and update requirements
gabrielbressan-tfy Jan 19, 2026
502fff7
Enhance process management and cleanup in reranker and process cache
gabrielbressan-tfy Jan 20, 2026
efb21b3
Merge pull request #33 from Talentify/bug/sc-76019/mindsdb-reranker-u…
gabrielbressan-tfy Jan 20, 2026
f661cc9
Integrate Google Search Console API support with enhanced connection …
gabrielbressan-tfy Feb 11, 2026
d4681d4
fix scopes
gabrielbressan-tfy Feb 11, 2026
1a5f416
fix table
gabrielbressan-tfy Feb 12, 2026
f4bd2d8
Add data_state parameter support in Google Search Console API integra…
gabrielbressan-tfy Feb 12, 2026
50b2de2
Enhance Google Search Console integration with data state and start r…
gabrielbressan-tfy Feb 12, 2026
b4c1890
Enhance Google Search Console integration with dynamic column handlin…
gabrielbressan-tfy Feb 12, 2026
6c83244
Enhance Google Search Console integration with dimension filtering su…
gabrielbressan-tfy Feb 12, 2026
26d77d3
Refactor Google Search Console integration: update parameter names fo…
gabrielbressan-tfy Feb 12, 2026
6d1164c
Enhance Google Search Console integration: add default date handling …
gabrielbressan-tfy Feb 12, 2026
8a04c33
Enhance Google Calendar integration: add support for querying multipl…
gabrielbressan-tfy Feb 13, 2026
a95d272
Enhance Google Calendar integration: add time formatting functions an…
gabrielbressan-tfy Feb 13, 2026
9a49696
Refactor Google Calendar integration: standardize timezone parameter …
gabrielbressan-tfy Feb 13, 2026
89ab7f1
Enhance Google Calendar integration: add event field flattening and t…
gabrielbressan-tfy Feb 13, 2026
7174963
Merge pull request #39 from Talentify/feature/sc-75568/integrar-com-a…
gabrielbressan-tfy Feb 13, 2026
3421f2d
Enhance XML parsing: clean HTML tags and extract URLs from anchor tags
gabrielbressan-tfy Feb 15, 2026
a902838
Enhance error handling and improve user feedback in query planning an…
gabrielbressan-tfy Feb 16, 2026
96a68f8
Enhance query planning: improve identifier collection for complex exp…
gabrielbressan-tfy Feb 18, 2026
bf0084d
Enhance query handling: return all raw columns for complex expression…
gabrielbressan-tfy Feb 18, 2026
65b52bd
Add MindsDB AI Coding Agent Guidelines documentation
gabrielbressan-tfy Feb 18, 2026
67feba4
Fix query planner: prevent forwarding ORDER BY to handler in plan_api…
gabrielbressan-tfy Feb 18, 2026
1bea2d1
Merge pull request #41 from Talentify/fix-urls-html-tag
gabrielbressan-tfy Feb 18, 2026
ba26c27
Enhance condition extraction: allow unwrapping of LOWER/UPPER functio…
gabrielbressan-tfy Feb 18, 2026
8cb705c
Fix JOIN column collection: include WHERE clause columns and refine f…
gabrielbressan-tfy Feb 18, 2026
50fd045
Refactor WHERE clause handling: strip absent columns and improve cond…
gabrielbressan-tfy Feb 19, 2026
4fa5737
Investigate GA handler property mix
gabrielbressan-tfy Feb 27, 2026
41fe8bf
Add context to GA run_report log
gabrielbressan-tfy Feb 27, 2026
b4073a2
Merge pull request #42 from Talentify/codex/troubleshoot-ga-handler-p…
gabrielbressan-tfy Mar 6, 2026
ae821bb
Merge upstream/main into fork
gabrielbressan-tfy Mar 6, 2026
002533e
Fix aipdf dependency conflict: downgrade to 0.0.6.3
gabrielbressan-tfy Mar 6, 2026
d3a1507
fix
gabrielbressan-tfy Mar 6, 2026
c07a395
Merge pull request #43 from Talentify/merge/upstream-main
gabrielbressan-tfy Mar 6, 2026
6b6b9a2
github connector
gabrielbressan-tfy Mar 8, 2026
e43ad4d
Merge pull request #44 from Talentify/feature-github-connector-improv…
gabrielbressan-tfy Mar 8, 2026
15d3a50
fix ga
gabrielbressan-tfy Mar 12, 2026
1024676
fix github
gabrielbressan-tfy Mar 12, 2026
790cc1f
Merge pull request #45 from Talentify/fix-query-issues
gabrielbressan-tfy Mar 12, 2026
5c5a19c
Add Sentry handler phase 1
patrickadeelino Mar 18, 2026
335ad30
Handle TypeCast order by in API resources
patrickadeelino Mar 18, 2026
182bdc8
Support Sentry issue date filters
patrickadeelino Mar 18, 2026
2e241f1
fix
patrickadeelino Mar 24, 2026
46b5210
Merge pull request #46 from Talentify/feature/sentry-handler-phase-1
patrickadeelino Mar 24, 2026
360b57d
Scope Sentry handler issues by environment
patrickadeelino Mar 24, 2026
f65701a
Remove issue payload environment column from Sentry table
patrickadeelino Mar 25, 2026
1917453
Merge pull request #47 from Talentify/patrick/sentry-environment-scop…
magroski Mar 25, 2026
e7befd1
microsoft ads
gabrielbressan-tfy Mar 26, 2026
eaf6f15
feat: add linkedin ads handler
patrickadeelino Mar 27, 2026
d3c06b6
refactor(linkedin-ads): separate responsibilities into specialized mo…
patrickadeelino Mar 30, 2026
c55e0c0
adding it to default handler
gabrielbressan-tfy Mar 30, 2026
4c8f3e4
Merge pull request #48 from Talentify/patrick/linkedin-ads-mindsdb-20…
magroski Mar 30, 2026
3962ee8
fix
gabrielbressan-tfy Mar 30, 2026
5b3de09
Merge branch 'main' into feature/sc-76874/-mindsdb-bing-ads-integration
gabrielbressan-tfy Mar 30, 2026
81467de
Merge pull request #49 from Talentify/feature/sc-76874/-mindsdb-bing-…
gabrielbressan-tfy Mar 30, 2026
967ef14
feat: add sentry explore logs support (#50)
patrickadeelino Apr 1, 2026
840578a
work
gabrielbressan-tfy Apr 6, 2026
8a47896
default handlers
gabrielbressan-tfy Apr 6, 2026
7890cdf
tables
gabrielbressan-tfy Apr 6, 2026
2b10f4b
Merge pull request #51 from Talentify/feature/sc-76870/-mindsdb-googl…
gabrielbressan-tfy Apr 7, 2026
6b02fba
first tables
gabrielbressan-tfy Apr 7, 2026
0fb8621
new tables
gabrielbressan-tfy Apr 7, 2026
af636c0
fix connection params
gabrielbressan-tfy Apr 9, 2026
bbb3444
removing staff members deprecated
gabrielbressan-tfy Apr 9, 2026
3b10720
fix tables
gabrielbressan-tfy Apr 9, 2026
8a010b1
complex queries
gabrielbressan-tfy Apr 9, 2026
381ae0f
Merge pull request #52 from Talentify/feature/sc-77168/melhorias-no-c…
gabrielbressan-tfy Apr 9, 2026
892dba1
keywords planner
gabrielbressan-tfy Apr 17, 2026
deebcac
adding geo targets and language to cache to speed up
gabrielbressan-tfy Apr 17, 2026
dd36367
Merge pull request #53 from Talentify/feature/sc-77255/keyword-planne…
gabrielbressan-tfy Apr 17, 2026
f0fe705
feat: adicionar handler Meta Ad Library (#54)
patrickadeelino Apr 22, 2026
69c4cb1
feat: add POST method and body support to multi_format_api_handler
gabrielbressan-tfy Apr 24, 2026
7e2608f
fixed parser
gabrielbressan-tfy Apr 27, 2026
84e632e
Merge pull request #55 from Talentify/modify-multi-format-api-handler…
gabrielbressan-tfy Apr 27, 2026
66dd8f1
fix(meta-ad-library): default reached countries to US
patrickadeelino Apr 28, 2026
8a4c6c1
Merge pull request #56 from Talentify/codex/meta-ad-library-us-default
patrickadeelino Apr 29, 2026
cd1eb0b
fix: aplicar pushdown de ad_reached_countries no Meta Ad Library (#57)
patrickadeelino May 4, 2026
735d746
Avoid reapplying WHERE in API integration subselects
gabrielbressan-tfy May 18, 2026
ade6206
fix: strip only handler-consumed WHERE conditions from SubSelectStep
gabrielbressan-tfy May 18, 2026
54d89a8
Merge pull request #58 from Talentify/codex/fix-intercepted-where-params
gabrielbressan-tfy May 18, 2026
101bb26
Corrige trace_id e span_id nos logs do Sentry (#59)
patrickadeelino May 22, 2026
05a8f14
feat: improve s3 handler file listing
gabrielbressan-tfy May 28, 2026
e7dc72b
fix region name
gabrielbressan-tfy Jun 6, 2026
c3d716a
Merge pull request #60 from Talentify/feat-improvements-to-s3-handler
gabrielbressan-tfy Jun 6, 2026
5ad2cd2
feat: capture bigquery query stats for mktplace metering
gabrielbressan-tfy Jun 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -65,5 +65,9 @@ node_modules
mindsdb/**/pyproject.toml
mindsdb/**/uv.lock

<<<<<<< HEAD
config.json
=======
# dsi tests
reports/
>>>>>>> upstream/main
242 changes: 242 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
# MindsDB — AI Coding Agent Guidelines

## Development Environment

- **Hot-reload**: MindsDB runs in Docker with a bind mount (`./mindsdb:/mindsdb`) and `watchfiles`. Python file changes take effect immediately — no container restart needed.
- **Testing queries**: Use `mindsdb_sdk` connecting to `http://127.0.0.1:47334`.
- **Config**: `config.json` at the project root, mounted at `/root/mindsdb_config.json` in the container.
- **Environment variables**: see `.env` (do not commit secrets; `GOOGLE_API_KEY` and DB credentials live there).

---

## Handler Architecture

### How the query planner splits API handler queries

When a handler is registered with `class_type = "api"`, MindsDB's query planner splits every SELECT into two steps:

1. **`FetchDataframeStep`** → calls the handler's `select()` with the original query (including complex targets). The handler must return the raw DataFrame with the columns DuckDB will need.
2. **`SubSelectStep`** → DuckDB executes the full original SELECT expression (CASE WHEN, SUM, GROUP BY, etc.) on top of the DataFrame from step 1.

**Implication**: handlers do not need to implement aggregations, CASE WHEN, or arithmetic. They only need to return the right raw columns. DuckDB handles everything else.

---

## Handler `select()` — Two Patterns

### Pattern A: Data-fetch-and-filter (most handlers)

The handler fetches all data from the API and then drops columns that weren't requested. Calendar, Search Console, email, HubSpot, Shopify, Xero all use this pattern.

**Correct implementation:**

```python
selected_columns = []
for target in query.targets:
if isinstance(target, ast.Star):
selected_columns = self.get_columns()
break
elif isinstance(target, ast.Identifier):
selected_columns.append(target.parts[-1])
else:
# Complex expression (CASE WHEN, SUM, BinaryOperation, etc.).
# The outer SubSelectStep/DuckDB layer handles the computation.
# Return all raw columns so DuckDB has what it needs.
selected_columns = self.get_columns()
break
if not selected_columns:
selected_columns = self.get_columns()
```

**Bugs to avoid:**
- `raise ValueError(f"Unknown query target {type(target)}")` — breaks any CTE or aggregation query.
- Silently skipping non-Identifier targets without a fallback — `selected_columns` stays empty and `set(df.columns).difference(set([]))` drops every column, returning an empty DataFrame.

### Pattern B: Column-selection-determines-API-params (e.g., Google Analytics)

The handler uses the SELECT targets to decide *what* to request from the API (GA4 dimensions vs metrics, Search Console dimensions, etc.). A raw `isinstance(target, ast.Identifier)` check silently skips columns referenced inside complex expressions, causing the API to be called with incomplete parameters.

**Correct implementation — add a recursive `_collect_identifiers` helper before the table class:**

```python
from typing import List
from mindsdb_sql_parser import ast


def _collect_identifiers(node) -> List[str]:
"""Recursively collect all Identifier column names from any AST node.

Walks into CASE WHEN, Function args, BinaryOperation, etc. so that
columns referenced inside complex expressions are not missed.
"""
if node is None:
return []
if isinstance(node, ast.Identifier):
return [str(node.parts[-1])]
if isinstance(node, ast.Case):
names = []
for condition, result in node.rules:
names.extend(_collect_identifiers(condition))
names.extend(_collect_identifiers(result))
names.extend(_collect_identifiers(node.default))
return names
if isinstance(node, ast.Function):
names = []
for arg in (node.args or []):
names.extend(_collect_identifiers(arg))
return names
if isinstance(node, ast.BinaryOperation):
return _collect_identifiers(node.args[0]) + _collect_identifiers(node.args[1])
if isinstance(node, ast.UnaryOperation):
return _collect_identifiers(node.args[0])
if isinstance(node, ast.TypeCast):
return _collect_identifiers(node.arg)
return []
```

**Then use it in `select()`:**

```python
seen = set()
for target in query.targets:
if isinstance(target, ast.Star):
# fall back to default dimensions/metrics
break
for col_name in _collect_identifiers(target):
if col_name in seen:
continue
seen.add(col_name)
# classify col_name as dimension or metric and add to API params
```

---

## Query Planner — Known Bugs Fixed in This Codebase

### 1. CTE must be cleared after `plan_cte()` — `query_planner.py`

After `self.plan_cte(query)` decomposes CTEs into steps, `query.cte` must be set to `None`. Otherwise the outer SELECT (which may reference a CTE name that resolves to a handler table) carries the full CTE definition into DuckDB, which fails with:

> `Catalog Error: Table with name <handler_table> does not exist`

```python
if query.cte is not None:
self.plan_cte(query)
query.cte = None # CTEs decomposed into steps; clear so DuckDB doesn't re-execute them
```

### 2. `plan_api_db_select` must NOT forward `order_by` to the handler — `query_planner.py`

`plan_api_db_select` splits a query into a handler fetch (`FetchDataframeStep`) and a DuckDB pass (`SubSelectStep`). It passes `order_by` from the SQL query to the handler, which is wrong: ORDER BY may reference **SQL aliases** (e.g. `SUM(sessions) AS total_sessions` → `ORDER BY total_sessions`) that are meaningless to the underlying API. The GA4 API returns:

> `400 Field total_sessions exists in OrderBy but is not defined in input Dimensions/Metrics list`

The outer SubSelectStep already retains `order_by` (it is not cleared like `where`/`limit`), so DuckDB applies it correctly after aggregation.

```python
# query_planner.py — plan_api_db_select()
query2 = Select(
targets=query.targets,
from_table=query.from_table,
where=query.where,
# order_by intentionally omitted: ORDER BY may reference SQL aliases unknown
# to the underlying API. The SubSelectStep/DuckDB layer handles it correctly.
limit=query.limit,
)
```

### 3. Handler-consumed WHERE params must not be re-evaluated by SubSelectStep — `api_handler.py`, `subselect_step.py`

`plan_api_db_select` splits an API query into `FetchDataframeStep` (handler) + `SubSelectStep` (DuckDB). Both receive the original WHERE (`plan_sub_select` deep-copies it). When a handler-consumed param name (e.g. `url`) collides with a column in the API response, DuckDB re-evaluates the condition against the response value and filters out all rows.

**Fix**: `APIResource.select()` propagates applied column names via `DataFrame.attrs['_applied_where_columns']`. `SubSelectStepCall` reads them and strips matching conditions from WHERE before DuckDB runs. Only handler-consumed conditions are stripped; non-consumed conditions remain for double-filtering safety.

```python
# api_handler.py — APIResource.select(), after filter_dataframe()
applied_where_cols = {cond.column.lower() for cond in conditions if cond.applied}
if applied_where_cols:
result.attrs['_applied_where_columns'] = applied_where_cols

# subselect_step.py — SubSelectStepCall.call(), after _strip_where_absent_columns()
applied_cols = df.attrs.get('_applied_where_columns', set())
if applied_cols:
query.where = _strip_applied_where_columns(query.where, applied_cols)
```

### 4. JOIN column collection must include WHERE — `plan_join.py`

`_collect_fetch_columns` runs on `query.targets` and `tbl.join_condition`, but columns referenced **only in the WHERE clause** (e.g. `LOWER(t2.sessionSourceMedium) LIKE '%linkedin%'`) are never added to `referenced_cols`. The handler then does not fetch them, and DuckDB fails with `Column not found`.

**Fix**: also traverse `query.where`:

```python
query_traversal(query.targets, _collect_fetch_columns)
query_traversal(query.where, _collect_fetch_columns) # ← required
for tbl in self.tables:
if tbl.join_condition is not None:
query_traversal(tbl.join_condition, _collect_fetch_columns)
```

### 5. JOIN `filter_col_names` must use `item.conditions`, not `conditions` — `plan_join.py`

`process_table()` computes `filter_col_names` to exclude API filter parameters (e.g. `start_date = 'yesterday'`) from the SELECT list so they aren't sent to the API as dimensions. Two bugs to avoid:

1. **`conditions` is cleared to `[]` when OR is in the WHERE clause** — so filter params would not be excluded, and they'd appear as GA4 dimension targets → 400 error. Use `item.conditions` (pre-OR-clear) instead.
2. **`IS NULL` is a `BinaryOperation` with `Constant(None)` as the partner** — `landingPagePlusQueryString IS NULL` would wrongly add `landingPagePlusQueryString` to `filter_col_names` and exclude it from the SELECT. Guard with `other.value is not None`.

```python
filter_col_names = set()
for cond in item.conditions: # ← item.conditions, not conditions
if isinstance(cond, BinaryOperation) and len(cond.args) >= 2:
for i, arg in enumerate(cond.args[:2]):
if isinstance(arg, Identifier):
other = cond.args[1 - i]
if isinstance(other, Constant) and other.value is not None: # ← non-null only
filter_col_names.add(arg.parts[-1])
fetch_cols = referenced_cols - filter_col_names
```

---

## Handler Checklist

When creating or modifying a handler's `select()` method:

- [ ] Does the handler use target columns to control API parameters (Pattern B)?
- If yes: use `_collect_identifiers()` to recursively extract column names.
- [ ] Does the handler fetch all data and then filter by column (Pattern A)?
- If yes: add `else: selected_columns = self.get_columns(); break` and a `if not selected_columns: selected_columns = self.get_columns()` guard.
- [ ] Never `raise ValueError` on unrecognised target types — complex expressions are valid inputs from the planner.
- [ ] Never leave `selected_columns` empty after the targets loop — that silently drops all result columns.
- [ ] WHERE filter params (e.g., `start_date`, `end_date`) should be extracted from `query.where` and passed to the API, not treated as SELECT dimensions.
- [ ] `get_columns()` must list every column the API can return so Pattern A drop-logic works correctly.

---

## Handlers in This Project

| Handler | Pattern | Notes |
|---|---|---|
| `google_analytics_handler` | B | Uses `_collect_identifiers`; target columns map to GA4 dimensions/metrics |
| `google_calendar_handler` | A | Fetches all events/calendars/free-busy, then filters columns |
| `google_search_handler` | A | Fetches traffic/sitemaps/url-inspection data, then filters columns |
| `email_handler` | A (via `SELECTQueryParser`) | Delegated to utility — safe |
| `hubspot_handler` | A (via `SELECTQueryParser`) | Delegated to utility — safe |
| `shopify_handler` | A (via `SELECTQueryParser`) | Delegated to utility — safe |
| `xero_handler` | A | No target iteration — safe |
| `ms_one_drive_handler` | A | String checks only — safe |
| `web_handler` (`url_reader`) | A | Uses `FilterCondition`, no target iteration — safe |
| `s3_handler` | A | Only scans targets for `"content"` key; full query passed to DuckDB |

---

## Relevant Source Paths

| File | Purpose |
|---|---|
| `mindsdb/api/executor/planner/query_planner.py` | `plan_select`, `plan_cte`, `plan_api_db_select`, `get_integration_select_step` |
| `mindsdb/api/executor/planner/plan_join.py` | `PlanJoinTablesQuery`, `process_table`, `get_filters_from_join_conditions` |
| `mindsdb/api/executor/sql_query/steps/subselect_step.py` | `SubSelectStepCall` — runs DuckDB on handler result |
| `mindsdb/api/executor/utilities/sql.py` | `query_df`, `query_df_with_type_infer_fallback` |
| `mindsdb/integrations/utilities/query_traversal.py` | `query_traversal` — AST walker used across planner and handlers |
| `mindsdb/integrations/handlers/<name>/` | Individual handler implementations |
63 changes: 63 additions & 0 deletions default_handlers.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
file # Required by the core codebase
postgres
mysql
openai
web
langchain # For agents & completions
# sabido default handlers
airtable
bigquery
binance
clickhouse
coinbase
databricks
discord
dropbox
email
github
gitlab
gmail
tripadvisor
google_ads
google_analytics
google_search
google_books
google_calendar
google_fit
hackernews
hubspot
linkedin_ads
intercom
jira
microsoft_ads
ms_one_drive
ms_teams
mssql
newsapi
notion
oilpriceapi
openstreetmap
oracle
paypal
pgvector
reddit
s3
s3vectors
salesforce
sharepoint
sheets
shopify
slack
snowflake
sentry
statsforecast
strava
stripe
twillio
twitter
web
youtube
xero
zendesk
zipcodebase
zotero
38 changes: 29 additions & 9 deletions docker/mindsdb.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# This stage's objective is to gather ONLY requirements.txt files and anything else needed to install deps.
# This stage will be run almost every build, but it is fast and the resulting layer hash will be the same unless a deps file changes.
# We do it this way because we can't copy all requirements files with a glob pattern in docker while maintaining the folder structure.
FROM python:3.10 AS deps
FROM python:3.10-slim AS deps
WORKDIR /mindsdb

# Copy everything to begin with
Expand All @@ -12,7 +12,7 @@ RUN find ./ -type f -not -name "requirements*.txt" -print0 | xargs -0 rm -f \
# Find every empty directory and delete it
&& find ./ -type d -empty -delete
# Copy setup.py and everything else used by setup.py
COPY setup.py README.md ./
COPY setup.py default_handlers.txt README.md ./
COPY mindsdb/__about__.py mindsdb/
# Now this stage only contains a few files and the layer hash will be the same if they don't change.
# Which will mean the next stage can be cached, even if the cache for the above stage was invalidated.
Expand All @@ -21,7 +21,7 @@ COPY mindsdb/__about__.py mindsdb/


# Use the stage from above to install our deps with as much caching as possible
FROM python:3.10 AS build
FROM python:3.10-slim AS build
WORKDIR /mindsdb

# Configure apt to retain downloaded packages so we can store them in a cache mount
Expand All @@ -34,7 +34,7 @@ RUN --mount=target=/var/lib/apt,type=cache,sharing=locked \
&& apt-get install -qy \
-o APT::Install-Recommends=false \
-o APT::Install-Suggests=false \
freetds-dev freetds-bin libpq5 curl unixodbc unixodbc-dev gnupg # freetds-dev required to build pymssql on arm64 for mssql_handler. Can be removed when we are on python3.11+
build-essential freetds-dev freetds-bin libpq5 curl unixodbc unixodbc-dev gnupg # build-essential required to compile C extensions like quadprog; freetds-dev required to build pymssql on arm64 for mssql_handler. Can be removed when we are on python3.11+

# Install Microsoft ODBC Driver 18 for SQL Server
# Use Debian 12 (bookworm) repo as it's the latest stable version supported by Microsoft
Expand Down Expand Up @@ -97,8 +97,8 @@ EXPOSE 47335/tcp
RUN python -m mindsdb --config=/root/mindsdb_config.json --load-tokenizer --update-gui

# Same as extras image, but with dev dependencies installed.
# This image is used in our docker-compose
FROM extras AS dev
# This image is used in our docker-compose and for local development with volume mounting
FROM build AS dev
WORKDIR /mindsdb

# Configure apt to retain downloaded packages so we can store them in a cache mount
Expand All @@ -114,12 +114,32 @@ RUN --mount=target=/var/lib/apt,type=cache,sharing=locked \
-o APT::Install-Suggests=false \
libpq5 freetds-bin curl

# Install dev requirements and install 'mindsdb' as an editable package
RUN --mount=type=cache,target=/root/.cache uv pip install -r requirements/requirements-dev.txt \
&& uv pip install --no-deps -e "."
# Copy requirements files to install dev dependencies
COPY --from=deps /mindsdb/requirements requirements/
# Install dev requirements
RUN --mount=type=cache,target=/root/.cache uv pip install -r requirements/requirements-dev.txt

# Copy minimal files needed for editable install
COPY setup.py default_handlers.txt README.md ./
COPY mindsdb/__about__.py mindsdb/__about__.py

# Install mindsdb as editable - this creates .egg-link that points to /mindsdb
# When we mount the volume, the editable install will use the mounted code
RUN --mount=type=cache,target=/root/.cache uv pip install --no-deps -e "."

# Copy code (will be overridden by volume mount in docker run)
COPY . .

COPY docker/mindsdb_config.release.json /root/mindsdb_config.json

ENV PYTHONUNBUFFERED=1
ENV MINDSDB_DOCKER_ENV=1
ENV VIRTUAL_ENV=/venv
ENV PATH=/venv/bin:$PATH

EXPOSE 47334/tcp
EXPOSE 47335/tcp

ENTRYPOINT [ "bash", "-c", "watchfiles --filter python 'python -Im mindsdb --config=/root/mindsdb_config.json --api=http,mysql' mindsdb" ]


Expand Down
Loading
Loading