Skip to content

Commit f8a633b

Browse files
committed
Faster SIMD alternative
1 parent f6612d2 commit f8a633b

File tree

4 files changed

+58
-29
lines changed

4 files changed

+58
-29
lines changed

README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ Place input files in `input/yearYYYY/dayDD.txt` including leading zeroes. For ex
5353
## Performance
5454

5555
Benchmarks are measured using the built-in `cargo bench` tool run on an [Apple M2 Max][apple-link].
56-
All 225 solutions from 2023 to to 2015 complete sequentially in **625 milliseconds**.
56+
All 225 solutions from 2023 to 2015 complete sequentially in **622 milliseconds**.
5757
Interestingly 84% of the total time is spent on just 9 solutions.
5858
Performance is reasonable even on older hardware, for example a 2011 MacBook Pro with an
5959
[Intel i7-2720QM][intel-link] processor takes 3.5 seconds to run the same 225 solutions.
@@ -66,7 +66,7 @@ Performance is reasonable even on older hardware, for example a 2011 MacBook Pro
6666
| [2022](#2022) | 10 |
6767
| [2021](#2021) | 10 |
6868
| [2020](#2020) | 286 |
69-
| [2019](#2019) | 19 |
69+
| [2019](#2019) | 16 |
7070
| [2018](#2018) | 36 |
7171
| [2017](#2017) | 102 |
7272
| [2016](#2016) | 133 |
@@ -221,7 +221,7 @@ Performance is reasonable even on older hardware, for example a 2011 MacBook Pro
221221
| 13 | [Care Package](https://adventofcode.com/2019/day/13) | [Source](src/year2019/day13.rs) | 2513 |
222222
| 14 | [Space Stoichiometry](https://adventofcode.com/2019/day/14) | [Source](src/year2019/day14.rs) | 17 |
223223
| 15 | [Oxygen System](https://adventofcode.com/2019/day/15) | [Source](src/year2019/day15.rs) | 361 |
224-
| 16 | [Flawed Frequency Transmission](https://adventofcode.com/2019/day/16) | [Source](src/year2019/day16.rs) | 4124 |
224+
| 16 | [Flawed Frequency Transmission](https://adventofcode.com/2019/day/16) | [Source](src/year2019/day16.rs) | 1960 |
225225
| 17 | [Set and Forget](https://adventofcode.com/2019/day/17) | [Source](src/year2019/day17.rs) | 341 |
226226
| 18 | [Many-Worlds Interpretation](https://adventofcode.com/2019/day/18) | [Source](src/year2019/day18.rs) | 1077 |
227227
| 19 | [Tractor Beam](https://adventofcode.com/2019/day/19) | [Source](src/year2019/day19.rs) | 674 |

docs/pie-2019.svg

+15-15
Loading

docs/pie-all.svg

+11-11
Loading

src/year2019/day16.rs

+29
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,11 @@ pub fn part2(input: &[u8]) -> usize {
148148
let upper = size * 10_000;
149149
assert!(lower <= start && start < upper);
150150

151+
compute(&digits, size, start, upper)
152+
}
153+
154+
#[cfg(not(feature = "simd"))]
155+
fn compute(digits: &[usize], size: usize, start: usize, upper: usize) -> usize {
151156
let mut coefficients = [0; 8];
152157
let mut result = [0; 8];
153158

@@ -163,6 +168,30 @@ pub fn part2(input: &[u8]) -> usize {
163168
result.fold_decimal()
164169
}
165170

171+
#[cfg(feature = "simd")]
172+
fn compute(digits: &[usize], size: usize, start: usize, upper: usize) -> usize {
173+
use std::simd::Mask;
174+
use std::simd::Simd;
175+
176+
let mask: Mask<i32, 8> = Mask::from_bitmask(1);
177+
let tens: Simd<u32, 8> = Simd::splat(10);
178+
179+
let mut coefficients: Simd<u32, 8> = Simd::splat(0);
180+
let mut result: Simd<u32, 8> = Simd::splat(0);
181+
182+
for (k, index) in (start..upper).enumerate() {
183+
coefficients = mask.select(
184+
Simd::splat(binomial_mod_10(k + 99, k) as u32),
185+
coefficients.rotate_elements_right::<1>(),
186+
);
187+
188+
let next = Simd::splat(digits[index % size] as u32);
189+
result += next * coefficients;
190+
}
191+
192+
(result % tens).to_array().fold_decimal() as usize
193+
}
194+
166195
/// Computes C(n, k) % 2
167196
///
168197
/// This collapses to a special case of a product of only 4 possible values:

0 commit comments

Comments
 (0)