- Use random_index to subsample:
-- randomly selects one out of every million objects
SELECT *
FROM gaiadr1.gaia_source
WHERE MOD(random_index, 1000000) = 0
- When cross-matching by position, do
1 = CONTAINS(POINT(smaller_catalog), CIRCLE(larger_catalog))
-
SIN(x), COS(x), TAN(x): all in radians
-
ASIN(x), ACOS(x), ATAN(x)
-
ATAN2(x, y): arctan(y/x) -> [-pi, +pi]
-
DEGREES(x): radians to degrees
-
RADIANS(x): degrees to radians
-
EXP(x)
-
LOG(x): natural log
-
LOG10(x)
-
POWER(x, y): x**y
-
SQRT(x)
-
ROUND(x, n): round to n decimal places. positive: to the right, negative: to the left of the decimal point.
-
FLOOR(x)
-
CEILING(x)
-
TRUNCATE(x, n)
-
ABS(x)
-
RAND(n)?
-
MOD(x, y): x mod y
-
PI()
All angle in degrees.
- POINT(coordsys, lon, lat)
- BOX(coordsys, longitudeCentre, latitudeCentre, longitudeExtent, latitudeExtent)
- CIRCLE(coordsys, longitudeCentre, latitudeCentre, radius)
- POLYGON(coordsys, longitude_1, latitude_1, ..., longitude_N, latitude_N)
- CONTAINS(geometry_1, geometry_2) (=1 : true, =0 : false)
- INTERSECTS(geometry_1, geometry_2)
- AREA(geometry): area in square degrees
- DISTANCE(point1, point2): great circle distance in degrees
- Put column names with spaces in double quotes ("").
- Put column values to compare in single quotes ('').
- Arithmetic operations are for columns. For operations between rows, use aggregate functions.
- comments:
-- single line
/* multiple
lines*/
-
LIKE: pattern matching
%
: wildcard e.g.,WHERE column LIKE 'abc%'
_
: any individual character e.g.,WHERE column LIKE 'ab_de'
- can be case sensitive or insensitive depending on DB?
-
IN: compare to a set of values
WHERE year in (2009, 2010)
WHERE flag in ('big', 'small')
-
BETWEEN AND: range of values incl. bounds
WHERE year_rank BETWEEN 5 AND 10
is exactly same asWHERE year_rank >= 5 AND year_rank <= 10
-
IS NULL: missing or not
= NULL
does not work since=
is arithmetic comparison
-
NOT
- year_rank NOT BETWEEN 2 AND 3
- "group" NOT LIKE '%macklemore%'
- artist IS NOT NULL
-
ORDER BY
- ascending by default; use ORDER BY column DESC for descending
- multiple columns:
ORDER BY year DESC, year_rank
-
COUNT
COUNT(*)
: count all rowsCOUNT(column)
: count all rows in column that is not NULL
-
SUM only on numerical columns
-
MIN, MAX works on non-numerical columns (alphabetical)
-
AVG on numerical columns, NULL rows are ignored
-
GROUP BY
- multiple columns: GROUP BY col1, col2
- HAVING: filter on aggregates
- order matters: select -> from -> where -> group by -> having -> order by
- JOIN = INNER JOIN: unmatched rows are dropped.
- LEFT/RIGHT JOIN: find match for every row in left, right table. Includes duplicates.
- FULL JOIN = FULL OUTER JOIN: union of two tables
- filter in JOIN by doing: JOIN table ON table.col1 = table2.col2 AND [condition]
- Casting - supported for adql?
- CAST(column AS integer) or column::integer
- Cleaning strings
- LEFT(column, 5): 5 characters from the left
- RIGHT(column, 5)
- LENGTH(column)
- TRIM([leading/trailing/both] 'characters' FROM column)
SELECT *
FROM (
-- inner query
SELECT ...
) sub
WHERE ...
- Subqueries are required to have names, which are added after parentheses.
When are subqueries useful?
- aggregate in multiple stages, e.g., average count for col1 grouped by col2
- with conditional logic
- treated as single value or multiple values not a table.
- do not put alias
... WHERE column = (SELECT MIN(col2) FROM ...)
... WHERE column IN (SELECT col2 FROM ...)
- subquery to pre-process columns in the same table, and self-join