Skip to content
bnnm edited this page Apr 9, 2023 · 1 revision

FUTURE CHANGES

Various things that could be added/optimized, and general ideas/thoughts/design/etc musings so I don't forget

code organization

code clarity

  • decoder: forward declarations in various codecs
  • decoder: use AICA step rather than index
  • decoder: coding_CBD2_int not actually used, bad name
  • decoder: frame_size for psx-cfg, adx, ms-ima, mtaf
  • renames: switch opus to nxopus/opusnx?
  • renames: 3DS IMA to NW (nintendoware) IMA?
  • renames: bfwav to bxwav
  • streamfile: remove bar streamfile
  • streamfile: substitute streamfile->open and other direct calls with accessors
  • core: rename num_streams > stream_count / subsong_count
  • core: clean unnecessary VGMSTREAM variables (ws_output_size)
  • core: setup_state_vgmstream(vgmstream) called twice
  • core: always load loop structs to simplify logic? (problem with many channels?)
  • core: add variable for layout config to separate codec config
  • describe: change "metadata from" to "metadata", "stream total samples" > "stream duration" (has floats)
  • describe: vgmstream->num_streams >= 1 to detect when to write subsongs?
  • describe: remove inits in external calls
  • plugins: Improve applying plugin config to avoid having to reapply on reset
  • core: change STREAM_NAME_NAME > vgmstream.stream_name_size
  • core: pimpl
  • core: don't preload loop config in init vgmstream but after reading vgmstream and doing setup_vgmstream?
    • allows setting 0/1 outsize loop function
    • uses more memory with many channels
  • core: generic path helper lib
  • add flat_layout and none_decoder for cleaner handling of nothing set

code utils

  • meta: try_vgmstream helper
    • pass try_t (like ovmi): .init = ..; .meta = ..; .ext = [...]; .offset = ...; .size =
    • try_init_vgmstreams (*list of: init, ext, offset, size)
    • awb/aax: pass list of possible metas + exts (see .psb)
    • problem: needs to create fakename exts per file

      use meta_t to swap extension (needs restoring after swapping)

  • meta: ffmpeg init helpers: wma, ac3, mp4
  • meta: improve c regex/wildcard for pairs: msf/xwb/acb/dual stereo/etc
  • maybe new vgmstream system
    • v = prelloc_vgmstream()
    • load params manually (channels, etc)
    • on init vgmstream, realloc_vgmstream(v), init with channels/etc or gives error
    • functions that modify v[ch] should call realloc first
  • array_grow(**ptr, etc) for txth/txtp

coding style

  • buf_len (current count) vs buf_size (max available)
  • "TODO" in caps
  • verbosity: streamfile > sf, buffer > buf / buf_size, dst/src, ...
  • Var num; // Double space before comment
  • pointer order: near type "type* blah" (Linux kernel style)
    • consistent with function return type: "type* fun_blah(...)" not "type *fun_blah(...)"
    • name may be anything but type is always "pointer to type" = * closer to type
  • less function parameters (too many = can be split)
  • bitmasks: ~15 more readable than 0xFFFFFF00?
  • constants
    const int max = 15;  //consumes memory
    int a[max];  //Invalid declaration outside of a function
     
    enum { max = 15 }; //doesn't consume memory, but only ints
    int a[max]; //OK outside
    
    #define MAX 10 // can be any type, less clean, but beware using () ex #define MAX (10*10)

meta cleanup

  • remove .pos for ogg (fake format, use TXTP)
  • remove fake exts:
    • vawx, str+sth, sts, bdsp, mn_str, .str in swvr
    • stma and variations
    • brstmspm: use txtp
    • .wmus: fake ext?
    • .zwdsp: use txth
    • .g1l: extract from container / use txth
  • clean brstm/rwsd/cstr
  • break .sps badly ripped
  • missing formats with names: GSB+GSP, XA30 (utf16)
  • clean sli/sfl, clean baf
  • move adx key detection code to adx_keys.c
  • .mih cleanup
  • rename raw metas to raw_xxxxx (init_vgmstream_pc_al2)
  • raw-ish stuff: lower priority
  • move various dsp clones to dsp meta
    • init_vgmstream_ngc_str
  • ffmpeg: reject some extensions to ensure they play in proper meta?
    • or only accept certain extensions (flac, mp3, aac, etc)
  • remove .sth from sets
  • don't play .R? (confuses)
  • reorder: swap body+head load order to head+body
    • head has IDs, easier to detect files
    • examples: spd+spt
  • missing channel layout: add xma, various places with ffmpeg, mta2?
  • capcom .mca fix bad rip hack

txth

  • inline sample_type? (num_samples = 0x1000 bytes? 0B1000?)
  • remove unused codecs? (coding_PCM8_U_int? IMA4?)
  • remove unused codec modes? (atrac3 joint stereo?, dsp byte interleave?)
  • new old codecs
    • siren14 for .s14?
    • .sc EXAKT SASSC 8-bit DPCM [PS2]
  • dsp coefs read immediate like other formats instead of using last header_file?
  • txth math: Recursive Descent Parsing / shunting-yard calculator in c
    • operate with unary/binary functions, with @ being one (makes "@(4 + 4)" possible)
  • hist for IMA (needed?)
  • bug: read num_samples with subfiles: problem, subfile is opened at end
  • encoder delay in mp3
  • add name_offset to the name table?
  • better security when including files (name list only in dir), make comment?
  • option to swap l+r, saturn PCM goes right first?
  • command to add generic filters (such as resample like XA)

txtp

sanity

  • meta: segfaults/div by zero on bad headers
    • fuzzy testing could help, but many bugs trigger only by having an exact field to 0
  • coding: buffer overflows on bad data
    • shouldn't happen anymore but codecs could use some fuzzy testing
  • layout: infinite loops with invalid data
    • needs more tests
  • ubsan:
    • left shift: dsp psx dtk vadpcm xa asf circus(all) ubi-adpcm
      • test: procion, imuse, xmd, eaxa, EA_MT
    • big shifts in compresswave
  • misc UB to check (assumed to not happen by compiler)
    • memcpy in overlapping buffers
    • a[i] = i++;
    • overflows, bad pointer refs, etc
    • C99 standard has a list of undefined behaviors in appendix J.2
    • buffer overflow
    • shifts overflow: int64_t i = 1; i <<= 72
    • negative shifts: i << -10
    • signed overflow (unsigned overflow is defined and wraps around)
    • casting float to int unrepresentable values
    • various things that compiler should warn (ex. not return'ing in a non-void function)
    • referencing NULLs: beware compiler optimizing UBs
      • int val = struct.a; if (struct==NULL) {...} //if is removed since struct 'asserts' non-nullity
  • improve malloc/calloc bound checks (possible to alloc big amounts)
  • maybe compile with ndebug _ndebug flags
  • drmemory:
    • ffmpeg handle leak?
    • mpg123 "initialized read" bug

builds

  • common: move 32bit DLL to subdir

  • compilers: GCC try link time optimizations

  • compilers: try -fcatch-undefined-behavior in clang

  • make: parallel builds don't work with -j correctly

  • common: fix 64 warnings size_t etc

  • msvc: simplify options

    • unicode character set for all? (winamp uses multybyte)
    • remove CodeAnalysisRuleSet?
  • msvc: vgmstream_full.sln to vgmstream.sln (maybe move to build/win32)

  • docs: PR join in libatrac9 + update commit

  • improve mpg123/etc libs include dirs

    • linux clashing with system libs
  • try set variables in CMakelists for ext-libs

  • readme: use tables (https://github.com/ifcaro/Open-PS2-Loader)

  • readme: clean extensions and put formats

    • formats .py to generate .md with info
  • msvc: rename avformat.exp to full name

    • use some kind of bat instead of calling in .vcxproj
    • cmake use that vcxproj (fails due to programs not in path?)
  • improve autoreleases:

  • cmake: libg719_decode remove cmake and use makefile

  • mode 32-bit .dll to subfolder

  • info

performance - general

  • bitreader: optimize (mainly or wwise = faster wem)
  • bitreader: bitreader_lsb move to utils, rename calls
  • mixing: crosstrack/crosslayer can be optimized with padding
  • mixing: pre-create fades/etc coefs in table + (doesn't need precision, non audible)
  • cache: improve !tags.m3u performance
    • foobar: needs custom cache; test if has meta = dont load tags
  • core: reduce stack size to fix some loop unroll optimizations in cog?
  • core: check static inline (ex. read_x) bloat?
  • core: aligned reads performance:
  • core: test performance in 64b (more registers = faster? faster function calls?)
  • samples: optimized planar to mixed with unrolled loops
  • compiler: flags
    • gcc test: -msse, -msse2, -msse3, -march=native (not too noticeable, maybe for some decoders/mixing)
  • gcc -Q --help=target
  • try C uint_fast in critical code

performance - files

  • sf: open_streamfile_by_filename improve copy to avoid dir separator double check
  • sf: buffer init not make bigger than filesize
  • sf: change buffer to code + structure
  • sf: excessive buffer copying
    • test removing buffer and reading directly
    • avoid reading from buffer in some cases?
    • read_buffer(**buf, size) to get streamfile buffer for faster reads
    • fread give buffer access if small enough? (less copying)
    • worth? decoders should read full frames anyway
    • make some kind of sf_reader for meta, use standard sf for decoders
  • sf: test double buffer speed
  • sf: snap reads to 0x10/etc for less re-reads?
    • ring buffer (unit 1kb) to improve rebuffer?
  • remove set buffer size when calling open_streamfile
  • FILE optimizations
    • test multiple buffers but same FILE vs different FILEs
    • test not duplicating FILE (faster vs reusing?)

performance - misc

  • profiling
  • micro optimizations in tight loops (probably not useful due to compiler optimizations):
    • minimize jump/branches and improve predictions (order likely ifs first)
    • avoid repeated flags tests/operations/ifs/etc
    • access arrays sequentially
    • less casts
    • nibble_shift could shift ^= 4 to rotate
    • simplify pow/sqrt/etc
    • memsets / struct something foo = {0} beware
    • modulo is slow (implicit div)
    • pow2 constants faster than free values (simplified to & x-1)
    • for unroll: stereo codecs could be faster if parsed 2 channels at the same time
    • use smaller size tables ex int8_t rather than int16
    • (sample & (1 << i)) use (sample & (1U << i)) to avoid implicit casting
    • float literal always use n.nf (outside tables)
    • union performs worse (prevents some register optimizations)
    • (i&1)==1 maybe better than !
    • LUT (look up tables) not always faster due to CPU cache miss
    • LUT not useful if only used once per function (cache trash)
    • LUT maybe not-static is faster (copied to function stack)
    • LUT may prevent vectorization
    • unneccesary masking (a & ff)
    • clamp loop instead of bound check
    • init once: static const double log2 = log(2.0); ..
    • keep together functions used often (CPU cache)
    • const int
    • better 1 struts with a,b thatn 2 arrays a[] b[] (memory kept together)
    • buffers always size^2
    • -fno-builtin may improve performance in some cases
    • reorder some instructions for parallel calculations
      • float y = a + b + c + d vs y = (a+b) + (c+d)
        • not auto due to worse rounding (but done with -ffast-math)

types

  • sample typedef to sample_t

  • off_t/size_t

    • size_t is unsigned but off_t signed (subtle substract bugs)
    • always size_t: problematic for existing code, not full support in MSVC?
    • always off_t: wonky usage, not full support in MSVC?
    • always off64_t: slower in 32b compilations?
    • always uint32_t/int64_t: ?
    • use custom typedef: not standard (kind of confusing)
      • vsize_t / voff_t? > int64_t
    • size_t "result of the sizeof operator"
    • off_t "represent file sizes" (posyx?)
  • fix printf 64b printf (%zu or PRIx64)

  • fix shadowing issues

  • replace long with int32/int64

  • static enums?

  • don't compare size_t to 0xFFFFFFFF due to size in 64b (use SIZE max?)

  • signedness

    • mixing type can lead to bugs and weirdness
    // all unsigned
    if (available – size >= optional) //BUG when available < size
    
    // all signed
    (available >= optional + size) //BUG when optional + size too big

    //not very common tbh
    for (unsigned int i = 10; i >= 0; i--) ; //BUG: infinite loop
    for (unsigned i = 10; i != -1; --i)

    if (unsigned i != -1) //???
    
      - beware int > size_t in lib calls (bound checking)
    //beware bound checking
    if (length <= 512) {
        memcpy(buffer, data, length); //length <1
  • #define contants make unsigned just in case

  • other langs don't support unsigned types

  • semantic: ordinal values may be unsigned, bitmasks unsigned

  • undetected overflow/underflow: unsigned/signed only changes error cases around

  • beware signed to unsigned implicit conversions

  • don't use unsigned for info that something shoulnd't be negative?

  • unsigned add may generate extra code for wraparound

  • file size is also an offset

  • size_t v = MAX, v = v*v ?? same signed

    • MAX_SIZE? PRix for printing?
    • (foff_t)i must cast
  • array indexing isn't unsigned as index can be negative:

    • a[10], b=&a[8], a[-1] = 1 //perfectly ok
  • don't use int to set 32b values? (embed systems use 16b int?)

  • performance penalties for using u8s and u16s as loop sentinels

  • beware integer promotions

    • any op long double = long double
    • any op double = double
    • any op float = float
    • any op long = long (if any is lesser than long)
    • any op int = int (if any is lesser than int)
    • short + short = int
    • unsigned + unsigned = unsigned
    • signed + signed = signed
    • signed + unsigned = unsigned
  • reader_t

    • improve get_string with buf (maybe don't copy but return pointer + strlen)
    • read in chunks: find last \n (from buf+readbytes), set "max" to that position
    • on next chunk, memmove max to end, then read rest
  • read helpers: read_u16ve(x,y,be)

    • reader, could pass externally
      • worse performance due to callbacks, probably ok in metas
    • vgmstream_reader r = init_reader(sf)
    • vgmstream_reader r = init_reader_buf(buf) //same with buffer
    • r.set_endian(r, be)
    • r.s16(r, 0x00), r.u16(..), r.u16be(..)
    • r.s16o(r) //reads from internal offset
    • r.seek(r); r.s16o(r) //moves internal offset
    • r.set_offset(r, 0x100); r.s16(r, 0x0) //clamps reads from 0x100+4

strings

  • improve str functions usage:
    • not used correctly (hard)
    • buffer overflows in (less probable) cases
  • use common v_str* functions to separate them
    • using non-standard functions may be confusing
  • concatn change for v_concat which calls functions
  • try to use str* functions when possible (optimized)
  • clean utf code
  • utf8_to_utf16 generic?

multifiles

  • too many files: may reach OS limit

    • Symphonic Rain opens 99 total files (all come from a pack)
    • check ZOE2 txtp multibank files
    • MSF+MUS +100 segments
    • Ubi SB/BAO +100 segments
  • refcounted streamfile: for bank-like files w/ subsongs

    • reuse same FILE N times
    • set manually in txtp?
    • but subfiles may be still extracted = name separate FILEs
  • streamfile/file pool: external SF handling

    • reuse same SF and don't close internally
    • sf close returns to pool
    • for full interleave files works ok too
  • dynamic open/close files when played

    • wwise style
    • need to optimize vgmstream opening / config
      • open and apply downmix config > need to save it
      • on first open save current open_x
      • SF always reopen
    • devs are probably loading everything in memory too, maybe some related setting
    • may need define SF handler in vgmstream_ctx_t / txtp

API

  • api: define api.h with common stuff

    • api.h = external, api_i.h = internal?
  • api: make vgmstream easier to compile as DLL (opaque structs, get_option(x))

    • extract info: ffmpeg_get_info(data,&channels, &sample_rate, ...)
  • modify internals:

    • api should return its own sample buffer to simplify float/etc
      • rather than defining buffers again and again per plugin
    • render: outputs samples done, -1 of eof, >=0 if ok
      • after eof memset buf
      • if infinite loop is set should return -1 ever
  • api definition:

    • vgmstream_ctx_t* init //return new vgmstream
    • setup(ctx, cfg) //pass config to use before play
    • open(ctx, sf) //start file
    • play(ctx) //plays and returns current sbuf
    • close(ctx) //deletes ctx
    • may open other files/subsongs (resets internals but keeps setup, may cache stuff)
    • names: libvgmstream_* (add #defines for shorter names if needed)
  • vgmstream ctx: pass meta cache to avoid re-reading subsong files again

    • for big subsongs, ex. cache total_subsongs and file offsets
    • vgmstream->(void*)cache, cache_size, that may be read/copied externally and passed to meta
    • save function pointer to last valid meta
      • make init_vgmstream_sf_subsong and read this
    • same subsongs not always called one after other, queue cache? (in case of txtp)
  • setup options

    • resample/upmix for piping (ex. shoutcast to stereo)
    • option to output samples: as-is (f32, s24, s16), to-s16, to-f32, f32-or-s16 (flags), etc

float/resampling/mixing

  • define buf + type

  • improve how mixing is done for easier handling of floats

  • float decoders

  • winamp max write is 8192

    • should check outmod->write out value
    • allow 24bit? > not very useful
  • decoders:

    • report sample type (F32, planar, ...) after open > use function call
    • change internals so they don't depend on num_samples (may stop if no more samples/eof)
  • interleave/deinterleave functions like sox

    • [c1a c1b ... c2a c2b ...] <> [c1a c2a c1b c2b ...]
  • configurable decoders like ffmpeg?

    • request_sample_fmt for opus float (in the codecCtx, before initializing the decoder)
    • add seek accurate flag for ogg? (loop)

layout / blocks

  • define packet/block/frame read functions

    • meta may define + set functions
    • may define a configurable common type
    • must be chainable for blocks-within-blocks
    • allow change vgmstream state
    • lazy / don't read until needed, important for eof
    • need to work ok with blocks-within-blocks
      • define seek on block level?
  • define packet_read that does reading and moves offset?

  • test bad block read outside file to see if blocked works fine

    • detect next block offset == current block offset
  • clean existing blocks using a base "block layout" + define function

    • on setup_vgmstream move block_layout function to callback
  • pass info like: packet_t { offset, size }

    • decoder sees packet_t, may divide into subpackets/frames
    • give return this buffer with a sbuf_t type + info
    • decoders may set sbuf_t to its own internal buffer if generic buffer doesn't work (same for mixing)
    • if plugin needs its own buffer, use manual copy from internal buffer to external buffer
    • allow setting target_samples so it can be called a bunch of times (won't do over max)

seeking

  • universal seeking code also for looping

    • seek code could be faster
      • FSB/Wwise Vorbis/etc: looping uses discard seeking, slow
      • noticeable on slow system when loop start is well into the file, ex. The 25th Ward's SLV_11, some wem
    • may ask blocks to seek (knows better)
  • read seek tables

    • may need to define type: in data, part of container/block layout, subframes, etc
    • make internal (offset, sample) list
      • check common seek table formats
    • issues with looping/seeking if table is wrong
      • always loop manually instead of using seek tables?
      • enable it via txtp? (can be fixed by disabling it)
  • define seek functions in layouts/codecs

    • defined (from seek table): skip to N
    • block layout: read N blocks until close + consume N samples manually
    • interleaved/flat: read frames + consume
    • flat: call codec's seek (skip frames)
  • check if seek pre-roll is needed

  • seek interpolation with current

decoders

  • plugin-like decoders

    • simplify decoder / add new feed layout
    • plugin extracts and sends samples N times
  • struts should be opaque most of the time with extractor functions

  • should be feed-like

    • pass packet_t
  • remove _int decoders

  • simplify multistreams:

    • instead of streamchannel define stream + channel numbers
    • combine buffers at the end
  • decoder loops

    • instead of decoding up to exact point, define up to frame
    • check how loop info in dsp works (most granular)
  • clean decoders, ex.

    • MP3 decoder: use next block offset and consume
    • allow calls of less samples than block: can be used to simplify discard?
    • allow complex blocks with a parent header like AWC

layout / sequenced

  • allow multiple streams to overlap and play at once

  • define N streams + M timepoints pointing to streams + stream pool

  • when reaching a timepoint init stream

    • may need to open the same stream multiple times, so needs a pool
  • stream pool may open streams as needed, leave as cache

    • may need to reuse FILEs to avoid reaching limit
  • overlapped transition format:

    • sets sequenced/playlist layout on txtp
    • convert txtp definition into stream pool + timepoints
    # without fades
    bgm01.adx #jo 25s  #ji 5s
    bgm02.adx
    # with fades
    bgm01.adx #jo 25s P / 2s 0s  #ji 5s P / 2s 0s
    bgm02.adx
    
    # meaning?
    # - 0s: bgm01.adx
    # - 25s: bgm02.adx
    # - etc

misc features

roadmap

  • research multibuffers
  • make core/ folder and cleanup
  • define api
  • separate mixing.c into files per filter
  • change plugins for api
  • prepare mixing for multibuffers
  • change decoders to output floats / etc
  • clean txtp parser
  • add sequenced layout