Skip to content
This repository has been archived by the owner on Jan 4, 2023. It is now read-only.

Naming harmonization between pyrog and river #634

Open
simonvadee opened this issue Sep 21, 2021 · 4 comments
Open

Naming harmonization between pyrog and river #634

simonvadee opened this issue Sep 21, 2021 · 4 comments
Assignees
Labels
help wanted Extra attention is needed prio/low proj/new-pyrog quality code quality, stability, tests, refactoring question Further information is requested

Comments

@simonvadee
Copy link
Contributor

This is meant to be an open discussion before actually starting to rename stuff across the back and the front.

Problem

We don't use the same terminology to reference the same concepts across the web application and the back-end. I think we can use this discussion to discuss naming in general (where it can be controversial) and use this occasion to harmonize the terminology we use.

Description

Credentials vs Database: we currently use Credential to refer to database connection informations (host,port,login,password,database name) but it can be confusing ("credentials of what ? a user ? to what ?").

Owner vs Schema: a database may contain many "schemas" . I think back in the days @Jasopaum and I were confused between the difference between both terms (and I think that the same word has a different meaning in postgres, mssql and oracle). We currently use Owner but I think DatabaseSchema or just Schema would be more accurate.

Source vs something else ?: a Source has a Crendential (ie: it is linked to a database) and has many Resource. It is meant to represent a "source of information" from which we want to be able to extract data in order to create FHIR resources. For now, it can only be an SQL database, and maybe it's fine until the pyrog scope remains unclear. For instance, when the data source is a "flux" (eg: a SFTP server with csv files), do we want pyrog/river to be aware of this ? This comes back to the datalake question and I'm not sure we want to address this here. However, don't hesitate if you have suggestions!

Resource vs Mapping or Mappings: This is the term for which we have the most ambiguity right now. It is called Resource in the back and Mapping in the webapp (lol what a great idea we had). I think we all agree that Resource is too vague and refers to too many concepts (even in software engineering in general). Mapping is a better word for this concept but should we use plural or singular ?

Column: is meant to represent a database column, but it also has table and owner fields. I think this one is fine (until we normalize the schema and use a single Column object for a column of the database) but I mention it anyway.

Implementation

First, let's agree on the naming. Then, we can do one PR for a single concept renaming (it means a new database migration in the back and updating the front and back code) at a time.

@simonvadee simonvadee added help wanted Extra attention is needed question Further information is requested quality code quality, stability, tests, refactoring labels Sep 21, 2021
@BPierrick
Copy link
Contributor

Mapping is a better word for this concept but should we use plural or singular ?

I don't see any problem about employing singular, as it may happen that we manipulate several of these objects at once.

I also have a suggestion to make, about the Attribute.path attribute, which has the same name as ElementDefinition.path but is not the same as a Fhir Path concept at all. Especially in the FhirResourceTree, this may lead to confusions.

@MiskoG
Copy link

MiskoG commented Sep 22, 2021

Great initiative @simonvadee 👍

  • I would totally go for Database. It's simpler and more straightforward than Credentials.

  • DatabaseSchema is also a good idea. Maybe I prefer this one instead of just Schema, in order to be more precise ? Not a strong opinion.

  • Great question about Source also. If it was me, I would go for something broader for example Project. So far it seems that the scope & nature of a "source" has indeed been a medical software - Chimio, or Millenium for instance. But as you said @simonvadee it could be a flow HL7 of data. Another reason why I like something like the Project term is that it puts no bias toward the use of Pyrog : I don't know maybe in the future we will find more convenient to split a mapping between several projects, even if it is the same "source" ? Otherwise maybe we can wait for @elsiehoffet-94 and @nriss on this.

  • I think Mapping is indeed a better idea than Ressource. To me what's this object is about is a mapping (well, a set of rules) between a source table (possibly filtered, joinded... and with a PK defined) and a FHIR ressource. What I didn't like about Ressource is that to me it only referred to the latest part of the mapping (the destination). Also, I think Mapping should stay singular as it doesn't refer to the many "column => FHIR attributes" mappings inside it, but the broader correspondance between one source table and a FHIR ressource. For example this one
    CleanShot 2021-09-22 at 10 48 15@2x

  • Column : don't have a strong opinion on this one

@elsiehoffet-94
Copy link
Contributor

Yes, a source should definitely be renamed, and project suits better and is more flexible (many projects for one database, different kinds of data origin..).
Regarding the mapping it seems fine by me, but I can anticipate some confusions : what about code mappings (between terminologies, aka conceptmaps), and how do we call the DBT rules ? @nriss any idea about the latter?

@nriss
Copy link
Contributor

nriss commented Sep 29, 2021

Credentials vs Database: we currently use Credential to refer to database connection informations (host,port,login,password,database name) but it can be confusing ("credentials of what ? a user ? to what ?").

I suggest to use the same word as airbyte: connection
Just an idea to think about: what if these connections are set in another part of pyrog and then when we want to create a source, we can choose a predefined connection.

Owner vs Schema: a database may contain many "schemas" . I think back in the days @Jasopaum and I were confused between the difference between both terms (and I think that the same word has a different meaning in postgres, mssql and oracle). We currently use Owner but I think DatabaseSchema or just Schema would be more accurate.

In airbyte, the form is updated depending on the choice of db

Source vs something else ?: a Source has a Crendential (ie: it is linked to a database) and has many Resource. It is meant to represent a "source of information" from which we want to be able to extract data in order to create FHIR resources. For now, it can only be an SQL database, and maybe it's fine until the pyrog scope remains unclear. For instance, when the data source is a "flux" (eg: a SFTP server with csv files), do we want pyrog/river to be aware of this ? This comes back to the datalake question and I'm not sure we want to address this here. However, don't hesitate if you have suggestions!

I agree with you @elsiehoffet-94 and @MiskoG, project seems great, i don't see any better word for now

Resource vs Mapping or Mappings: This is the term for which we have the most ambiguity right now. It is called Resource in the back and Mapping in the webapp (lol what a great idea we had). I think we all agree that Resource is too vague and refers to too many concepts (even in software engineering in general). Mapping is a better word for this concept but should we use plural or singular ?

Mapping is ok for me. Why are you hesitating between singular or plural ? It depends on the situation, no ? I don't have any idea about what is the best
What are you calling DBT rules @elsiehoffet-94 ? It is the sql request that generate the dbt views ? According to me, there is no need to name that because it is seen as a classical table on pyrog

Column: is meant to represent a database column, but it also has table and owner fields. I think this one is fine (until we normalize the schema and use a single Column object for a column of the database) but I mention it anyway.

👍

@MiskoG MiskoG removed their assignment Oct 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted Extra attention is needed prio/low proj/new-pyrog quality code quality, stability, tests, refactoring question Further information is requested
Projects
None yet
Development

No branches or pull requests

9 participants