feat: Add infer_schema with confidence-scored type inference (#855)#2580
Open
dynamo-pentester wants to merge 1 commit into
Open
feat: Add infer_schema with confidence-scored type inference (#855)#2580dynamo-pentester wants to merge 1 commit into
dynamo-pentester wants to merge 1 commit into
Conversation
|
@dynamo-pentester is attempting to deploy a commit to the xtylishanish-gmailcom's projects Team on Vercel. A member of the Team first needs to authorize it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements automatic schema/data type inference with confidence scoring, as scoped in #855.
Adds
infer_schema(frame: ArFrame) -> InferredSchemato the quality layer. For each column, it returns aColumnInferencewith a best-guessinferred_type, a deterministicconfidencescore in[0.0, 1.0], anis_ambiguousflag, and the fullcandidatesscore breakdown.Closes #855
What changed
arnio/quality.pyColumnInferenceandInferredSchemafrozen dataclassesinfer_schema(frame)— a thin layer over the existingprofile()output (reusessuggested_dtype,semantic_type,null_ratio,unique_ratio; no second parsing engine)_INFER_CANDIDATE_TYPES(the six supported types:int64,float64,bool,datetime,categorical,string) and_AMBIGUITY_THRESHOLD = 0.15in one placeColumnInference.to_dict()/InferredSchema.to_dict()— fully JSON-safe, deterministic column/candidate orderingInferredSchema.to_schema()— maps only to the dtypesField/Schemaalready accept (int64,float64,bool,datetime,string);categorical→string. All fields default tonullable=True(conservative default)from .schema import Field, Schemaimport for theto_schema()return typearnio/__init__.pyColumnInference,InferredSchema,infer_schematests/test_infer_schema.py(new)yes/no,true/false,1/0[0.0, 1.0]to_dict()JSON serialization and orderingto_schema()→ validSchema, correct dtype mappings, usable withar.validate()TypeErroron non-ArFrame)website/api.htmlinfer_schema(frame)entry to the Quality section, documentingColumnInferencefields andInferredSchemamethodsOut of scope (untouched)
profile(),compare_profiles(),detect_drift()/DriftReportSchema,Field, validators (only consumed viato_schema())How to test
Example usage