Skip to content

Commit

Permalink
Release 3.1.3 (#286)
Browse files Browse the repository at this point in the history
Some minor fixes from beta.

Also removed the one-column warning.

Still with manual version bump - Hopefully all works well in terms of the package versioning. I'll automate the version bumps later on.
  • Loading branch information
harelba authored Nov 26, 2021
1 parent f6c0299 commit c14ac80
Show file tree
Hide file tree
Showing 11 changed files with 424 additions and 225 deletions.
44 changes: 22 additions & 22 deletions .github/workflows/build-and-package.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -139,12 +139,12 @@ jobs:
gem install fpm
cp dist/fpm-config ~/.fpm
fpm -s dir -t deb --deb-use-file-permissions -p packages/linux/q-text-as-data-3.1.1-beta-1.x86_64.deb --version 3.1.1-beta ./linux-q=/usr/bin/q USAGE.gz=/usr/share/man/man1/q.1.gz
fpm -s dir -t deb --deb-use-file-permissions -p packages/linux/q-text-as-data-3.1.3-1.x86_64.deb --version 3.1.3 ./linux-q=/usr/bin/q USAGE.gz=/usr/share/man/man1/q.1.gz
- name: Upload DEB Package
uses: actions/[email protected]
with:
name: q-text-as-data-3.1.1-beta-1.x86_64.deb
path: packages/linux/q-text-as-data-3.1.1-beta-1.x86_64.deb
name: q-text-as-data-3.1.3-1.x86_64.deb
path: packages/linux/q-text-as-data-3.1.3-1.x86_64.deb

test-deb-packaging:
runs-on: ubuntu-18.04
Expand All @@ -155,7 +155,7 @@ jobs:
- name: Download DEB
uses: actions/download-artifact@v2
with:
name: q-text-as-data-3.1.1-beta-1.x86_64.deb
name: q-text-as-data-3.1.3-1.x86_64.deb
- name: Install Python for Testing
uses: actions/setup-python@v2
with:
Expand All @@ -167,7 +167,7 @@ jobs:
pip3 install -r test-requirements.txt
- name: Test DEB Package Installation
run: ./dist/test-using-deb.sh ./q-text-as-data-3.1.1-beta-1.x86_64.deb
run: ./dist/test-using-deb.sh ./q-text-as-data-3.1.3-1.x86_64.deb

package-linux-rpm:
needs: [test-linux, create-man]
Expand Down Expand Up @@ -199,12 +199,12 @@ jobs:
gem install fpm
cp dist/fpm-config ~/.fpm
fpm -s dir -t rpm --rpm-use-file-permissions -p packages/linux/q-text-as-data-3.1.1-beta.x86_64.rpm --version 3.1.1-beta ./linux-q=/usr/bin/q USAGE.gz=/usr/share/man/man1/q.1.gz
fpm -s dir -t rpm --rpm-use-file-permissions -p packages/linux/q-text-as-data-3.1.3.x86_64.rpm --version 3.1.3 ./linux-q=/usr/bin/q USAGE.gz=/usr/share/man/man1/q.1.gz
- name: Upload RPM Package
uses: actions/[email protected]
with:
name: q-text-as-data-3.1.1-beta.x86_64.rpm
path: packages/linux/q-text-as-data-3.1.1-beta.x86_64.rpm
name: q-text-as-data-3.1.3.x86_64.rpm
path: packages/linux/q-text-as-data-3.1.3.x86_64.rpm

test-rpm-packaging:
runs-on: ubuntu-18.04
Expand All @@ -215,9 +215,9 @@ jobs:
- name: Download RPM
uses: actions/download-artifact@v2
with:
name: q-text-as-data-3.1.1-beta.x86_64.rpm
name: q-text-as-data-3.1.3.x86_64.rpm
- name: Retest using RPM
run: ./dist/test-using-rpm.sh ./q-text-as-data-3.1.1-beta.x86_64.rpm
run: ./dist/test-using-rpm.sh ./q-text-as-data-3.1.3.x86_64.rpm

build-mac:
runs-on: macos-11
Expand Down Expand Up @@ -308,7 +308,7 @@ jobs:
export BRANCH_NAME=master
# TODO temp, since template rendering action doesn't work in mac
cat .github/workflows/q.rb.brew-formula-template | sed 's/{{ .Q_VERSION }}/3.1.1-beta/g' | sed "s/{{ .Q_BRANCH_NAME }}/${BRANCH_NAME}/g" > ./brew/q.rb
cat .github/workflows/q.rb.brew-formula-template | sed 's/{{ .Q_VERSION }}/3.1.3/g' | sed "s/{{ .Q_BRANCH_NAME }}/${BRANCH_NAME}/g" > ./brew/q.rb
echo "Resulting formula:"
cat ./brew/q.rb
Expand All @@ -322,8 +322,8 @@ jobs:
- name: Upload Executable
uses: actions/[email protected]
with:
name: q--3.1.1-beta_1.big_sur.bottle.tar.gz
path: ./q--3.1.1-beta_1.big_sur.bottle.tar.gz
name: q--3.1.3_1.big_sur.bottle.tar.gz
path: ./q--3.1.3_1.big_sur.bottle.tar.gz

# TODO auto-create PR to main homebrew-core
# git clone https://github.com/harelba/homebrew-core.git
Expand All @@ -340,7 +340,7 @@ jobs:
- name: Download q bottle
uses: actions/download-artifact@v2
with:
name: q--3.1.1-beta_1.big_sur.bottle.tar.gz
name: q--3.1.3_1.big_sur.bottle.tar.gz
- name: Test the created bottle
run: |
set -x -e
Expand All @@ -349,7 +349,7 @@ jobs:
WD=$(pwd)
pushd /usr/local/Cellar
tar xvfz ${WD}/q--3.1.1-beta_1.big_sur.bottle.tar.gz
tar xvfz ${WD}/q--3.1.3_1.big_sur.bottle.tar.gz
popd
brew link q
Expand Down Expand Up @@ -459,17 +459,17 @@ jobs:
# TODO Windows versions do not support the -beta postfix
export Q_MSI=./build/x86_64-pc-windows-msvc/release/msi_installer/q-text-as-data-3.1.1.msi
export Q_MSI=./build/x86_64-pc-windows-msvc/release/msi_installer/q-text-as-data-3.1.3.msi
chmod 755 $Q_MSI
mkdir -p packages/windows/
cp $Q_MSI packages/windows/q-text-as-data-3.1.1.msi
cp $Q_MSI packages/windows/q-text-as-data-3.1.3.msi
- name: Upload Windows MSI
uses: actions/[email protected]
with:
name: q-text-as-data-3.1.1.msi
path: packages/windows/q-text-as-data-3.1.1.msi
name: q-text-as-data-3.1.3.msi
path: packages/windows/q-text-as-data-3.1.3.msi

test-windows-packaging:
needs: package-windows
Expand All @@ -480,12 +480,12 @@ jobs:
- name: Download Windows Package
uses: actions/download-artifact@v2
with:
name: q-text-as-data-3.1.1.msi
name: q-text-as-data-3.1.3.msi
- name: Test Install of MSI
continue-on-error: true
shell: powershell
run: |
$process = Start-Process msiexec.exe -ArgumentList "/i q-text-as-data-3.1.1.msi -l* msi-install.log /norestart /quiet" -PassThru -Wait
$process = Start-Process msiexec.exe -ArgumentList "/i q-text-as-data-3.1.3.msi -l* msi-install.log /norestart /quiet" -PassThru -Wait
$process.ExitCode
gc msi-install.log
Expand All @@ -494,7 +494,7 @@ jobs:
continue-on-error: true
shell: powershell
run: |
$process = Start-Process msiexec.exe -ArgumentList "/u q-text-as-data-3.1.1.msi /norestart /quiet" -PassThru -Wait
$process = Start-Process msiexec.exe -ArgumentList "/u q-text-as-data-3.1.3.msi /norestart /quiet" -PassThru -Wait
$process.ExitCode
exit $process.ExitCode
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/q.rb.brew-formula-template
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@ class Q < Formula
desc "Run SQL directly on CSV or TSV files"
homepage "https://harelba.github.io/q/"
# Using branch name for pre-releases, for tagged releases this would be the version tag, and not "version" part will be needed
url "https://github.com/harelba/q/archive/{{ .Q_BRANCH_NAME }}.tar.gz"
version "{{ .Q_VERSION }}"
url "https://github.com/harelba/q/archive/3.1.3.tar.gz"

# Removed for now, until everything is finalized
# sha256 "0844aed6658d0347a299b84bee978c88724d45093e8cbd7b05506ecc0b93c98c"
Expand Down
73 changes: 13 additions & 60 deletions QSQL-NOTES.md
Original file line number Diff line number Diff line change
@@ -1,57 +1,22 @@

# New beta version 3.1.1-beta is available
Installation instructions [at the end of this document](QSQL-NOTES.md#installation-of-the-new-beta-release)

Contains a lot of major changes, see sections below for details.

## Basic Example of using the caching
```bash
# Prepare some data
$ seq 1 1000000 > myfile.csv

# read from the resulting file (-c 1 just prevents the warning of having one column only)
$ time q -c 1 "select sum(c1),count(*) from myfile.csv"
500000500000 1000000
q -c 1 "select sum(c1),count(*) from myfile.csv" 4.02s user 0.06s system 99% cpu 4.108 total

# Running with `-C readwrite` auto-creates a cache file if there is none. The cache filename would be myfile.csv.qsql. The query runs as usual
$ time q -c 1 "select sum(c1),count(*) from myfile.csv" -C readwrite
time q -c 1 "select sum(c1),count(*) from myfile.csv" -C readwrite
500000500000 1000000
q -c 1 "select sum(c1),count(*) from myfile.csv" -C readwrite 3.96s user 0.08s system 99% cpu 4.057 total

# Now run with `-C read`. The query will run from the cache file and not the original. As the file gets bigger, the difference will be much more noticable
$ time q -c 1 "select sum(c1),count(*) from myfile.csv" -C read
500000500000 1000000
q -c 1 "select sum(c1),count(*) from myfile.csv" -C read 0.17s user 0.05s system 94% cpu 0.229 total

# Now let's try another query on that file. Notice the short query duration. The cache is being used for any query that uses this file, and queries on multiple files that contain caches will reuse the cache as well.
$ time q -c 1 "select avg(c1) from myfile.csv" -C read
500000.5
q -c 1 "select avg(c1) from myfile.csv" -C read 0.16s user 0.05s system 99% cpu 0.217 total

# You can also query the qsql file directly
$ time q -c 1 "select sum(c1),count(*) from myfile.csv.qsql"
500000500000 1000000
q -c 1 "select sum(c1),count(*) from myfile.csv.qsql" 0.17s user 0.05s system 95% cpu 0.226 total

# Now let's delete the original csv file
$ rm -vf myfile.csv

# Running another query directly on the qsql file just works
$ q -c 1 "select sum(c1),count(*) from myfile.csv.qsql"
500000500000 1000000
q -c 1 "select sum(c1),count(*) from myfile.csv.qsql" 0.17s user 0.04s system 94% cpu 0.226 total

# See the `.qrc` section below if you want to set the default `-C` (`--caching-mode`) to something other than `none` (the default)
```
## Major changes and additions in the new `3.x` version
This is the list of new/changed functionality in this version. Large changes, please make sure to read the details if you're already using q.

* **Automatic Immutable Caching** - Automatic caching of data files (into `<my-csv-filename>.qsql` files), with huge speedups for medium/large files. Enabled through `-C readwrite` or `-C read`
* **Direct querying of standard sqlite databases** - Just use it as a table name in the query. Format is `select ... from <sqlitedb_filename>:::<table_name>`, or just `<sqlitedb_filename>` if the database contains only one table. Multiple separate sqlite databases are fully supported in the same query.
* **Direct querying of the `qsql` cache files** - The user can query directly from the `qsql` files, removing the need for the original files. Just use `select ... from <my-csv-filename>.qsql`. Please wait until the non-beta version is out before thinking about deleting any of your original files...
* **Revamped `.qrc` mechanism** - allows opting-in to caching without specifying it in every query. By default, caching is **disabled**, for backward compatibility and for finding usability issues.
* **Save-to-db is now reusable for queries** - `--save-db-to-disk` option (`-S`) has been enhanced to match the new capabilities. You can query the resulting file directly through q, using the method mentioned above (it's just a standard sqlite database).
* **Only python3 is supported from now on** - Shouldn't be an issue, since q is a self-contained binary executable which has its own python embedded in it. Internally, q is now packaged with Python 3.8. After everything cools down, I'll probably bump this to 3.9/3.10.
* **Minimal Linux Version Bumped** - Works with CentOS 8, Ubuntu 18.04+, Debian 10+. Currently only for x86_64. Depends on glibc version 2.25+. Haven't tested it on other architectures. Issuing other architectures will be possible later on
* **Completely revamped binary packaging** - Using [pyoxidizer](https://github.com/indygreg/PyOxidizer)

The following sections provide the details of each of the new functionality in this major version.
The following sections provide the details of each of the new functionalities in this major version.

## Automatic caching of data files
Speeding up subsequent reads from the same file by several orders of magnitude by automatically creating an immutable cache file for each tabular text file.

For example, reading a 0.9GB file with 1M rows and 100 columns without caching takes ~50 seconds. When the cache exists, querying the same file will take less than 1 second. Obviously, the cache can be used in order to perform any query and not just the original query that was used for creating the cache.
For example, reading a 0.9GB file with 1M rows and 100 columns without caching takes ~50 seconds. When the cache exists, querying the same file will take around ~1-2 seconds. Obviously, the cache can be used in order to perform any query and not just the original query that was used for creating the cache.

When caching is enabled, the cache is created on the first read of a file, and used automatically when reading it in other queries. A separate cache is being created for each file that is being used, allowing reuse in multiple use-cases. For example, if two csv files each have their own cache file from previous queries, then running a query that JOINs these two files would use the caches as well (without loading the data into memory), speeding it up considerably.

Expand Down Expand Up @@ -139,15 +104,3 @@ Removed the dual py2/py3 support. Since q is packaged as a self-contained execut
Users which for some reason still use q's main source code file directly and use python 2 would need to stay with the latest 2.0.19 release. In some next version, q's code structure is going to change significantly anyway in order to become a standard python module, so using the main source code file directly would not be possible.

If you are such a user, and this decision hurts you considerably, please ping me.


# Installation of the new beta release
For now, only Linux RPM, DEB, Mac OSX and Windows are supported. Packages for additional Linux Distros will be added later (it should be rather easy now, due to the use of `fpm`).

The beta OSX version is not in `brew` yet, you'll need to take the `macos-q` executable, put it in your filesystem and `chmod +x` it.

Note: For some reason showing the q manual (`man q`) does not work for Debian, even though it's packaged in the DEB file. I'll get around to fixing it later. If you have any thoughts about this, please drop me a line.

Download the relevant files directly from [The Beta Release Assets](https://github.com/harelba/q/releases/tag/v3.1.1-beta).


Loading

0 comments on commit c14ac80

Please sign in to comment.