-
Notifications
You must be signed in to change notification settings - Fork 0
v changes
Various things that could be added/optimized, and general ideas/thoughts/design/etc musings so I don't forget
- make "core" folder
- better
.h
separation- import less for cleaner and faster compilation
- separate: channel_layout.h, reader_get.h, reader_put.h, reader_read.h
- log.h: only include in DEBUG_VGM or manually
- forward delarations hide hca decoder.h, g72x_state
- simplify names (coding/blah_decoder.c > coding/blah.c)
- separate txth/txtp into files
- decoder: forward declarations in various codecs
- decoder: use AICA step rather than index
- decoder: coding_CBD2_int not actually used, bad name
- decoder: frame_size for psx-cfg, adx, ms-ima, mtaf
- renames: switch opus to nxopus/opusnx?
- renames: 3DS IMA to NW (nintendoware) IMA?
- renames: bfwav to bxwav
- streamfile: remove bar streamfile
- streamfile: substitute streamfile->open and other direct calls with accessors
- core: rename num_streams > stream_count / subsong_count
- core: clean unnecessary VGMSTREAM variables (ws_output_size)
- core: setup_state_vgmstream(vgmstream) called twice
- core: always load loop structs to simplify logic? (problem with many channels?)
- core: add variable for layout config to separate codec config
- describe: change "metadata from" to "metadata", "stream total samples" > "stream duration" (has floats)
- describe: vgmstream->num_streams >= 1 to detect when to write subsongs?
- describe: remove inits in external calls
- plugins: Improve applying plugin config to avoid having to reapply on reset
- core: change STREAM_NAME_NAME > vgmstream.stream_name_size
- core: pimpl
- core: don't preload loop config in init vgmstream but after reading vgmstream and doing setup_vgmstream?
- allows setting 0/1 outsize loop function
- uses more memory with many channels
- core: generic path helper lib
- add flat_layout and none_decoder for cleaner handling of nothing set
- meta: try_vgmstream helper
- pass try_t (like ovmi): .init = ..; .meta = ..; .ext = [...]; .offset = ...; .size =
- try_init_vgmstreams (*list of: init, ext, offset, size)
- awb/aax: pass list of possible metas + exts (see .psb)
- problem: needs to create fakename exts per file
use meta_t to swap extension (needs restoring after swapping)
- meta: ffmpeg init helpers: wma, ac3, mp4
- meta: improve c regex/wildcard for pairs: msf/xwb/acb/dual stereo/etc
- maybe new vgmstream system
- v = prelloc_vgmstream()
- load params manually (channels, etc)
- on init vgmstream, realloc_vgmstream(v), init with channels/etc or gives error
- functions that modify v[ch] should call realloc first
- array_grow(**ptr, etc) for txth/txtp
- buf_len (current count) vs buf_size (max available)
- "TODO" in caps
- verbosity: streamfile > sf, buffer > buf / buf_size, dst/src, ...
- Var num; // Double space before comment
- pointer order: near type "type* blah" (Linux kernel style)
- consistent with function return type: "type* fun_blah(...)" not "type *fun_blah(...)"
- name may be anything but type is always "pointer to type" = * closer to type
- less function parameters (too many = can be split)
- bitmasks: ~15 more readable than 0xFFFFFF00?
- constants
const int max = 15; //consumes memory
int a[max]; //Invalid declaration outside of a function
enum { max = 15 }; //doesn't consume memory, but only ints
int a[max]; //OK outside
#define MAX 10 // can be any type, less clean, but beware using () ex #define MAX (10*10)
-
use
.c.inc
for include files? -
#ifdef over #pragma
-
maybe: <stdbool.h> bool fn(bool b)
- remove .pos for ogg (fake format, use TXTP)
- remove fake exts:
- vawx, str+sth, sts, bdsp, mn_str, .str in swvr
- stma and variations
- brstmspm: use txtp
- .wmus: fake ext?
- .zwdsp: use txth
- .g1l: extract from container / use txth
- clean brstm/rwsd/cstr
- break .sps badly ripped
- missing formats with names: GSB+GSP, XA30 (utf16)
- clean sli/sfl, clean baf
- move adx key detection code to adx_keys.c
- .mih cleanup
- rename raw metas to
raw_xxxxx
(init_vgmstream_pc_al2
) - raw-ish stuff: lower priority
- move various dsp clones to dsp meta
- init_vgmstream_ngc_str
- ffmpeg: reject some extensions to ensure they play in proper meta?
- or only accept certain extensions (flac, mp3, aac, etc)
- remove .sth from sets
- don't play .R? (confuses)
- reorder: swap body+head load order to head+body
- head has IDs, easier to detect files
- examples: spd+spt
- missing channel layout: add xma, various places with ffmpeg, mta2?
- capcom .mca fix bad rip hack
- inline sample_type? (num_samples = 0x1000 bytes? 0B1000?)
- remove unused codecs? (coding_PCM8_U_int? IMA4?)
- remove unused codec modes? (atrac3 joint stereo?, dsp byte interleave?)
- new old codecs
- siren14 for .s14?
- .sc EXAKT SASSC 8-bit DPCM [PS2]
- dsp coefs read immediate like other formats instead of using last header_file?
- txth math: Recursive Descent Parsing / shunting-yard calculator in c
- operate with unary/binary functions, with @ being one (makes "@(4 + 4)" possible)
- hist for IMA (needed?)
- bug: read num_samples with subfiles: problem, subfile is opened at end
- encoder delay in mp3
- add name_offset to the name table?
- better security when including files (name list only in dir), make comment?
- option to swap l+r, saturn PCM goes right first?
- command to add generic filters (such as resample like XA)
- alt time modes? bmp = N, time 10B = beats (for Wwise)
- overlapped transitions
- Smash Hit uses overlapped segments
- paper mario loop layers different points: https://hcs64.com/mboard/forum.php?showthread=63084
- meta: segfaults/div by zero on bad headers
- fuzzy testing could help, but many bugs trigger only by having an exact field to 0
- coding: buffer overflows on bad data
- shouldn't happen anymore but codecs could use some fuzzy testing
- layout: infinite loops with invalid data
- needs more tests
- ubsan:
- left shift: dsp psx dtk vadpcm xa asf circus(all) ubi-adpcm
- test: procion, imuse, xmd, eaxa, EA_MT
- big shifts in compresswave
- left shift: dsp psx dtk vadpcm xa asf circus(all) ubi-adpcm
- misc UB to check (assumed to not happen by compiler)
- memcpy in overlapping buffers
- a[i] = i++;
- overflows, bad pointer refs, etc
- C99 standard has a list of undefined behaviors in appendix J.2
- buffer overflow
- shifts overflow: int64_t i = 1; i <<= 72
- negative shifts: i << -10
- signed overflow (unsigned overflow is defined and wraps around)
- casting float to int unrepresentable values
- various things that compiler should warn (ex. not return'ing in a non-void function)
- referencing NULLs: beware compiler optimizing UBs
- int val = struct.a; if (struct==NULL) {...} //if is removed since struct 'asserts' non-nullity
- improve malloc/calloc bound checks (possible to alloc big amounts)
- maybe compile with ndebug _ndebug flags
- drmemory:
- ffmpeg handle leak?
- mpg123 "initialized read" bug
-
common: move 32bit DLL to subdir
-
compilers: GCC try link time optimizations
-
compilers: try -fcatch-undefined-behavior in clang
-
make: parallel builds don't work with -j correctly
-
common: fix 64 warnings size_t etc
-
msvc: simplify options
- unicode character set for all? (winamp uses multybyte)
- remove CodeAnalysisRuleSet?
-
msvc: vgmstream_full.sln to vgmstream.sln (maybe move to build/win32)
-
docs: PR join in libatrac9 + update commit
-
improve mpg123/etc libs include dirs
- linux clashing with system libs
-
try set variables in CMakelists for ext-libs
-
readme: use tables (https://github.com/ifcaro/Open-PS2-Loader)
-
readme: clean extensions and put formats
- formats .py to generate .md with info
-
msvc: rename avformat.exp to full name
- use some kind of bat instead of calling in .vcxproj
- cmake use that vcxproj (fails due to programs not in path?)
-
improve autoreleases:
- https://github.com/BtbN/FFmpeg-Builds/releases
- duckstation, pcsx2
-
cmake: libg719_decode remove cmake and use makefile
-
mode 32-bit .dll to subfolder
-
info
- bitreader: optimize (mainly or wwise = faster wem)
- bitreader: bitreader_lsb move to utils, rename calls
- mixing: crosstrack/crosslayer can be optimized with padding
- mixing: pre-create fades/etc coefs in table + (doesn't need precision, non audible)
- cache: improve !tags.m3u performance
- foobar: needs custom cache; test if has meta = dont load tags
- core: reduce stack size to fix some loop unroll optimizations in cog?
- core: check static inline (ex.
read_x
) bloat? - core: aligned reads performance:
- core: test performance in 64b (more registers = faster? faster function calls?)
- samples: optimized planar to mixed with unrolled loops
- compiler: flags
- gcc test: -msse, -msse2, -msse3, -march=native (not too noticeable, maybe for some decoders/mixing)
gcc -Q --help=target
- try C uint_fast in critical code
- sf: open_streamfile_by_filename improve copy to avoid dir separator double check
- sf: buffer init not make bigger than filesize
- sf: change buffer to code + structure
- sf: excessive buffer copying
- test removing buffer and reading directly
- avoid reading from buffer in some cases?
- read_buffer(**buf, size) to get streamfile buffer for faster reads
- fread give buffer access if small enough? (less copying)
- worth? decoders should read full frames anyway
- make some kind of sf_reader for meta, use standard sf for decoders
- sf: test double buffer speed
- sf: snap reads to 0x10/etc for less re-reads?
- ring buffer (unit 1kb) to improve rebuffer?
- remove set buffer size when calling open_streamfile
- FILE optimizations
- test multiple buffers but same FILE vs different FILEs
- test not duplicating FILE (faster vs reusing?)
- profiling
- manual tick counts
- http://www.codersnotes.com/sleepy/
- https://software.intel.com/en-us/vtune
- http://gpuopen.com/compute-product/codexl/
- https://marketplace.visualstudio.com/items?itemName=ArtemGevorkyan.MicroProfilerx86
- https://marketplace.visualstudio.com/items?itemName=ArtemGevorkyan.MicroProfilerx64x86
- micro optimizations in tight loops (probably not useful due to compiler optimizations):
- minimize jump/branches and improve predictions (order likely ifs first)
- avoid repeated flags tests/operations/ifs/etc
- access arrays sequentially
- less casts
- nibble_shift could shift ^= 4 to rotate
- simplify pow/sqrt/etc
- memsets / struct something foo = {0} beware
- modulo is slow (implicit div)
- pow2 constants faster than free values (simplified to & x-1)
- for unroll: stereo codecs could be faster if parsed 2 channels at the same time
- use smaller size tables ex int8_t rather than int16
- (sample & (1 << i)) use (sample & (1U << i)) to avoid implicit casting
- float literal always use n.nf (outside tables)
- union performs worse (prevents some register optimizations)
- (i&1)==1 maybe better than !
- LUT (look up tables) not always faster due to CPU cache miss
- LUT not useful if only used once per function (cache trash)
- LUT maybe not-static is faster (copied to function stack)
- LUT may prevent vectorization
- unneccesary masking (a & ff)
- clamp loop instead of bound check
- init once: static const double log2 = log(2.0); ..
- keep together functions used often (CPU cache)
- const int
- better 1 struts with a,b thatn 2 arrays a[] b[] (memory kept together)
- buffers always size^2
-
-fno-builtin
may improve performance in some cases - reorder some instructions for parallel calculations
- float y = a + b + c + d vs y = (a+b) + (c+d)
- not auto due to worse rounding (but done with -ffast-math)
- float y = a + b + c + d vs y = (a+b) + (c+d)
-
sample typedef to sample_t
-
off_t/size_t
- size_t is unsigned but off_t signed (subtle substract bugs)
- always size_t: problematic for existing code, not full support in MSVC?
- always off_t: wonky usage, not full support in MSVC?
- always off64_t: slower in 32b compilations?
- always uint32_t/int64_t: ?
- use custom typedef: not standard (kind of confusing)
- vsize_t / voff_t? > int64_t
- size_t "result of the sizeof operator"
- off_t "represent file sizes" (posyx?)
-
fix printf 64b printf (%zu or PRIx64)
-
fix shadowing issues
-
replace long with int32/int64
-
static enums?
-
don't compare size_t to 0xFFFFFFFF due to size in 64b (use SIZE max?)
-
signedness
- mixing type can lead to bugs and weirdness
// all unsigned
if (available – size >= optional) //BUG when available < size
// all signed
(available >= optional + size) //BUG when optional + size too big
//not very common tbh
for (unsigned int i = 10; i >= 0; i--) ; //BUG: infinite loop
for (unsigned i = 10; i != -1; --i)
if (unsigned i != -1) //???
- beware int > size_t in lib calls (bound checking)
//beware bound checking
if (length <= 512) {
memcpy(buffer, data, length); //length <1
-
#define contants make unsigned just in case
-
other langs don't support unsigned types
-
semantic: ordinal values may be unsigned, bitmasks unsigned
-
undetected overflow/underflow: unsigned/signed only changes error cases around
-
beware signed to unsigned implicit conversions
-
don't use unsigned for info that something shoulnd't be negative?
-
unsigned add may generate extra code for wraparound
-
file size is also an offset
-
size_t v = MAX, v = v*v ?? same signed
- MAX_SIZE? PRix for printing?
- (foff_t)i must cast
-
array indexing isn't unsigned as index can be negative:
- a[10], b=&a[8], a[-1] = 1 //perfectly ok
-
don't use int to set 32b values? (embed systems use 16b int?)
-
performance penalties for using u8s and u16s as loop sentinels
-
beware integer promotions
- any op long double = long double
- any op double = double
- any op float = float
- any op long = long (if any is lesser than long)
- any op int = int (if any is lesser than int)
- short + short = int
- unsigned + unsigned = unsigned
- signed + signed = signed
- signed + unsigned = unsigned
-
reader_t
- improve get_string with buf (maybe don't copy but return pointer + strlen)
- read in chunks: find last \n (from buf+readbytes), set "max" to that position
- on next chunk, memmove max to end, then read rest
-
read helpers: read_u16ve(x,y,be)
- reader, could pass externally
- worse performance due to callbacks, probably ok in metas
vgmstream_reader r = init_reader(sf)
vgmstream_reader r = init_reader_buf(buf) //same with buffer
r.set_endian(r, be)
r.s16(r, 0x00), r.u16(..), r.u16be(..)
r.s16o(r) //reads from internal offset
r.seek(r); r.s16o(r) //moves internal offset
r.set_offset(r, 0x100); r.s16(r, 0x0) //clamps reads from 0x100+4
- reader, could pass externally
- improve str functions usage:
- not used correctly (hard)
- buffer overflows in (less probable) cases
- use common v_str* functions to separate them
- using non-standard functions may be confusing
- concatn change for v_concat which calls functions
- try to use str* functions when possible (optimized)
- clean utf code
- utf8_to_utf16 generic?
-
too many files: may reach OS limit
- Symphonic Rain opens 99 total files (all come from a pack)
- check ZOE2 txtp multibank files
- MSF+MUS +100 segments
- Ubi SB/BAO +100 segments
-
refcounted streamfile: for bank-like files w/ subsongs
- reuse same FILE N times
- set manually in txtp?
- but subfiles may be still extracted = name separate FILEs
-
streamfile/file pool: external SF handling
- reuse same SF and don't close internally
- sf close returns to pool
- for full interleave files works ok too
-
dynamic open/close files when played
- wwise style
- need to optimize vgmstream opening / config
- open and apply downmix config > need to save it
- on first open save current open_x
- SF always reopen
- devs are probably loading everything in memory too, maybe some related setting
- may need define SF handler in vgmstream_ctx_t / txtp
-
api: define api.h with common stuff
- api.h = external, api_i.h = internal?
-
api: make vgmstream easier to compile as DLL (opaque structs, get_option(x))
- extract info: ffmpeg_get_info(data,&channels, &sample_rate, ...)
-
modify internals:
- api should return its own sample buffer to simplify float/etc
- rather than defining buffers again and again per plugin
- render: outputs samples done, -1 of eof, >=0 if ok
- after eof memset buf
- if infinite loop is set should return -1 ever
- api should return its own sample buffer to simplify float/etc
-
api definition:
- vgmstream_ctx_t* init //return new vgmstream
- setup(ctx, cfg) //pass config to use before play
- open(ctx, sf) //start file
- play(ctx) //plays and returns current sbuf
- close(ctx) //deletes ctx
- may open other files/subsongs (resets internals but keeps setup, may cache stuff)
- names: libvgmstream_* (add #defines for shorter names if needed)
-
vgmstream ctx: pass meta cache to avoid re-reading subsong files again
- for big subsongs, ex. cache total_subsongs and file offsets
- vgmstream->(void*)cache, cache_size, that may be read/copied externally and passed to meta
- save function pointer to last valid meta
- make init_vgmstream_sf_subsong and read this
- same subsongs not always called one after other, queue cache? (in case of txtp)
-
setup options
- resample/upmix for piping (ex. shoutcast to stereo)
- option to output samples: as-is (f32, s24, s16), to-s16, to-f32, f32-or-s16 (flags), etc
-
define buf + type
-
improve how mixing is done for easier handling of floats
-
float decoders
-
winamp max write is 8192
- should check outmod->write out value
- allow 24bit? > not very useful
-
decoders:
- report sample type (F32, planar, ...) after open > use function call
- change internals so they don't depend on num_samples (may stop if no more samples/eof)
-
interleave/deinterleave functions like sox
- [c1a c1b ... c2a c2b ...] <> [c1a c2a c1b c2b ...]
-
configurable decoders like ffmpeg?
- request_sample_fmt for opus float (in the codecCtx, before initializing the decoder)
- add seek accurate flag for ogg? (loop)
-
define packet/block/frame read functions
- meta may define + set functions
- may define a configurable common type
- must be chainable for blocks-within-blocks
- allow change vgmstream state
- lazy / don't read until needed, important for eof
- need to work ok with blocks-within-blocks
- define seek on block level?
-
define packet_read that does reading and moves offset?
-
test bad block read outside file to see if blocked works fine
- detect next block offset == current block offset
-
clean existing blocks using a base "block layout" + define function
- on setup_vgmstream move block_layout function to callback
-
pass info like: packet_t { offset, size }
- decoder sees packet_t, may divide into subpackets/frames
- give return this buffer with a sbuf_t type + info
- decoders may set sbuf_t to its own internal buffer if generic buffer doesn't work (same for mixing)
- if plugin needs its own buffer, use manual copy from internal buffer to external buffer
- allow setting target_samples so it can be called a bunch of times (won't do over max)
-
universal seeking code also for looping
- seek code could be faster
- FSB/Wwise Vorbis/etc: looping uses discard seeking, slow
- noticeable on slow system when loop start is well into the file, ex. The 25th Ward's SLV_11, some wem
- may ask blocks to seek (knows better)
- seek code could be faster
-
read seek tables
- may need to define type: in data, part of container/block layout, subframes, etc
- make internal (offset, sample) list
- check common seek table formats
- issues with looping/seeking if table is wrong
- always loop manually instead of using seek tables?
- enable it via txtp? (can be fixed by disabling it)
-
define seek functions in layouts/codecs
- defined (from seek table): skip to N
- block layout: read N blocks until close + consume N samples manually
- interleaved/flat: read frames + consume
- flat: call codec's seek (skip frames)
-
check if seek pre-roll is needed
- maybe use filter between old and new sbufs
- use vorbis_packet_blocksize to get samples to skip
- https://kaworu.ch/blog/2013/09/29/writting-ogg-slash-vorbis-comment-in-c/
-
seek interpolation with current
-
plugin-like decoders
- simplify decoder / add new feed layout
- plugin extracts and sends samples N times
-
struts should be opaque most of the time with extractor functions
-
should be feed-like
- pass packet_t
-
remove _int decoders
-
simplify multistreams:
- instead of streamchannel define stream + channel numbers
- combine buffers at the end
-
decoder loops
- instead of decoding up to exact point, define up to frame
- check how loop info in dsp works (most granular)
-
clean decoders, ex.
- MP3 decoder: use next block offset and consume
- allow calls of less samples than block: can be used to simplify discard?
- allow complex blocks with a parent header like AWC
-
allow multiple streams to overlap and play at once
-
define N streams + M timepoints pointing to streams + stream pool
-
when reaching a timepoint init stream
- may need to open the same stream multiple times, so needs a pool
-
stream pool may open streams as needed, leave as cache
- may need to reuse FILEs to avoid reaching limit
-
overlapped transition format:
- sets sequenced/playlist layout on txtp
- convert txtp definition into stream pool + timepoints
# without fades
bgm01.adx #jo 25s #ji 5s
bgm02.adx
# with fades
bgm01.adx #jo 25s P / 2s 0s #ji 5s P / 2s 0s
bgm02.adx
# meaning?
# - 0s: bgm01.adx
# - 25s: bgm02.adx
# - etc
- upload test suite
- cleanup needed
- upload for standard files created by public tools
- tags.m3u multiline comments
- detect "\n" and change to "\r\n" or " \n"?
- allow foobar's ";" separator that behaves like multiple tags?
- allow multiple composers?
# %COMPOSER Blah1
# %COMPOSER Blah2
- add .acf filters
- needed for better output in some cases (notably on Chunithm Paradise's Air op)
- mixing: add repeating fade?
- simplifies sequenced layout
- tags: create RIFF with comment tags (ID3)
- init: pass meta_context_t (sf, reader, ext/ext_size, cache, etc)
- streamfile void* field to pass meta_info_t with extensions, etc
- read standard/common tags
- ogg, wem adtl
- lazy load? callback and init only when needed since they are rarely used?
- max recursion to avoid stack overflow on subfiles?
- log improvements
- research multibuffers
- make core/ folder and cleanup
- define api
- separate mixing.c into files per filter
- change plugins for api
- prepare mixing for multibuffers
- change decoders to output floats / etc
- clean txtp parser
- add sequenced layout