Skip to content

regenerate the mime tables from apache httpd; general refactorings #2042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

devoncarew
Copy link
Member

(note that this isn't intended to land - more to discuss potential changes)

Regenerate the mime tables from the apache httpd mapping and other general refactorings:

  • regenerate the mime tables from the list maintained by the apache httpd project; generate the tables into lib/src/mime_tables.g.dart
  • change the magic numbers to use string hex values for the header bytes instead of list literals (this makes the code more terse)
  • consolidate several library public symbols into lib/src/mime.dart
  • make MimeTypeResolver have a similar API to the current two top level library symbols - lookupMimeType() and extensionFromMime()

I think instead of looking to make the top-level lookupMimeType() and extensionFromMime() methods something you can add mime types to, we want to have everything accessed through a mime registry, and by default provide a global instance of that. The registry could be configurable at construction time.

MimeTypeResolver is somewhat similar to that, except that you can mutate it after construction. If we were to use it as our mime registry, the global instance would be mutable, which seems less desirable.

Related to #2028; cc @lrhn.

Copy link

Package publishing

Package Version Status Publish tag (post-merge)
package:bazel_worker 1.1.3-wip WIP (no publish necessary)
package:benchmark_harness 2.3.1 already published at pub.dev
package:boolean_selector 2.1.2 already published at pub.dev
package:browser_launcher 1.1.3 already published at pub.dev
package:cli_config 0.2.1-wip WIP (no publish necessary)
package:cli_util 0.4.2 already published at pub.dev
package:clock 1.1.2 already published at pub.dev
package:code_builder 4.10.1 already published at pub.dev
package:coverage 1.12.0-wip WIP (no publish necessary)
package:csslib 1.0.2 already published at pub.dev
package:extension_discovery 2.1.0 already published at pub.dev
package:file 7.0.2-wip WIP (no publish necessary)
package:file_testing 3.1.0-wip WIP (no publish necessary)
package:glob 2.1.3 already published at pub.dev
package:graphs 2.3.3-wip WIP (no publish necessary)
package:html 0.15.5+1 ready to publish html-v0.15.5+1
package:io 1.1.0-wip WIP (no publish necessary)
package:json_rpc_2 3.0.3 already published at pub.dev
package:markdown 7.3.1-wip WIP (no publish necessary)
package:mime 2.0.0 already published at pub.dev
package:oauth2 2.0.4-wip WIP (no publish necessary)
package:package_config 2.2.0 already published at pub.dev
package:pool 1.5.2-wip WIP (no publish necessary)
package:pub_semver 2.2.0 already published at pub.dev
package:pubspec_parse 1.5.0 already published at pub.dev
package:source_map_stack_trace 2.1.3-wip WIP (no publish necessary)
package:source_maps 0.10.14-wip WIP (no publish necessary)
package:source_span 1.10.1 already published at pub.dev
package:sse 4.1.7 already published at pub.dev
package:stack_trace 1.12.1 already published at pub.dev
package:stream_channel 2.1.4 already published at pub.dev
package:stream_transform 2.1.2-wip WIP (no publish necessary)
package:string_scanner 1.4.1 already published at pub.dev
package:term_glyph 1.2.3-wip WIP (no publish necessary)
package:test_reflective_loader 0.2.3 already published at pub.dev
package:timing 1.0.2 already published at pub.dev
package:unified_analytics 7.0.2 ready to publish unified_analytics-v7.0.2
package:watcher 1.1.1 already published at pub.dev
package:yaml 3.1.3 already published at pub.dev
package:yaml_edit 2.2.2 already published at pub.dev

Documentation at https://github.com/dart-lang/ecosystem/wiki/Publishing-automation.

Copy link

PR Health

Breaking changes ⚠️
Package Change Current Version New Version Needed Version Looking good?
mime Breaking 2.0.0 2.0.0 3.0.0
Got "2.0.0" expected >= "3.0.0" (breaking changes)
⚠️

This check can be disabled by tagging the PR with skip-breaking-check.

Changelog Entry
Package Changed Files
package:mime pkgs/mime/lib/mime.dart
pkgs/mime/lib/src/extension.dart
pkgs/mime/lib/src/magic_number.dart
pkgs/mime/lib/src/magic_numbers.dart
pkgs/mime/lib/src/mime.dart
pkgs/mime/lib/src/mime_shared.dart
pkgs/mime/lib/src/mime_tables.g.dart
pkgs/mime/lib/src/mime_type.dart
pkgs/mime/lib/src/multipart_stream.dart
pkgs/mime/lib/src/multipart_transformer.dart
pkgs/mime/pubspec.yaml

Changes to files need to be accounted for in their respective changelogs.

This check can be disabled by tagging the PR with skip-changelog-check.

Coverage ⚠️
File Coverage
pkgs/mime/lib/mime.dart 💔 Not covered
pkgs/mime/lib/src/magic_numbers.dart 💚 100 %
pkgs/mime/lib/src/mime.dart 💚 83 %
pkgs/mime/lib/src/mime_shared.dart 💚 33 %
pkgs/mime/lib/src/mime_type.dart 💔 76 % ⬇️ 15 %
pkgs/mime/lib/src/multipart_stream.dart 💚 89 %
pkgs/mime/lib/src/multipart_transformer.dart 💚 100 %
pkgs/mime/tool/generate_markdown.dart 💔 Not covered
pkgs/mime/tool/update_media_types.dart 💔 Not covered

This check for test coverage is informational (issues shown here will not fail the PR).

This check can be disabled by tagging the PR with skip-coverage-check.

API leaks ✔️

The following packages contain symbols visible in the public API, but not exported by the library. Export these symbols or remove them from your publicly visible API.

Package Leaked API symbols
License Headers ✔️
// Copyright (c) 2025, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.
Files
no missing headers

All source files should start with a license header.

Unrelated files missing license headers
Files
pkgs/bazel_worker/benchmark/benchmark.dart
pkgs/bazel_worker/example/client.dart
pkgs/bazel_worker/example/worker.dart
pkgs/benchmark_harness/integration_test/perf_benchmark_test.dart
pkgs/boolean_selector/example/example.dart
pkgs/clock/lib/clock.dart
pkgs/clock/lib/src/clock.dart
pkgs/clock/lib/src/default.dart
pkgs/clock/lib/src/stopwatch.dart
pkgs/clock/lib/src/utils.dart
pkgs/clock/test/clock_test.dart
pkgs/clock/test/default_test.dart
pkgs/clock/test/stopwatch_test.dart
pkgs/clock/test/utils.dart
pkgs/coverage/lib/src/coverage_options.dart
pkgs/coverage/test/collect_coverage_config_test.dart
pkgs/coverage/test/config_file_locator_test.dart
pkgs/html/example/main.dart
pkgs/html/lib/dom.dart
pkgs/html/lib/dom_parsing.dart
pkgs/html/lib/html_escape.dart
pkgs/html/lib/parser.dart
pkgs/html/lib/src/constants.dart
pkgs/html/lib/src/encoding_parser.dart
pkgs/html/lib/src/html_input_stream.dart
pkgs/html/lib/src/list_proxy.dart
pkgs/html/lib/src/query_selector.dart
pkgs/html/lib/src/token.dart
pkgs/html/lib/src/tokenizer.dart
pkgs/html/lib/src/treebuilder.dart
pkgs/html/lib/src/utils.dart
pkgs/html/test/dom_test.dart
pkgs/html/test/parser_feature_test.dart
pkgs/html/test/parser_test.dart
pkgs/html/test/query_selector_test.dart
pkgs/html/test/selectors/level1_baseline_test.dart
pkgs/html/test/selectors/level1_lib.dart
pkgs/html/test/selectors/selectors.dart
pkgs/html/test/support.dart
pkgs/html/test/tokenizer_test.dart
pkgs/pubspec_parse/test/git_uri_test.dart
pkgs/stack_trace/example/example.dart
pkgs/watcher/test/custom_watcher_factory_test.dart
pkgs/yaml_edit/example/example.dart

@github-actions github-actions bot added the type-infra A repository infrastructure change or enhancement label Mar 17, 2025
),
];

Uint8List hex(String encoded) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider dropping the hex and allowing the list to be constant.
Use String hex encodings like:

MagicNumber(
    'image/heif',
    '\x00\x00\x00\x00\x66\x74\x79\x70\x6D\x69\x66\x31',
    mask: '\x00\x00\x00\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF',
),

Having the list be constant means you won't have all the computation of converting every string to a Uint8List on first access.
Then work directly off the strings, using codeUnitAt instead of List.[].

(I considered making the hex-parsing lazy, but you're likely to access every element if you trry to magic-number a random file.)

final result = Uint8List(encoded.length ~/ 2);
for (var i = 0; i < result.length; i++) {
final offset = i * 2;
result[i] = int.parse(encoded.substring(offset, offset + 2), radix: 16);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a hex codec somewhere.
... found it: https://pub.dev/documentation/convert/latest/convert/HexCodec-class.html

Don't know if it's worth using. This code can definitely be optimized.

@@ -53,7 +41,7 @@ class MimeTypeResolver {
/// though a file have been saved using the wrong file-name extension. If less
/// than [magicNumbersMaxLength] bytes was provided, some magic-numbers won't
/// be matched against.
String? lookup(String path, {List<int>? headerBytes}) {
String? lookupMimeType(String path, {List<int>? headerBytes}) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are adding new members, is now a good time to change the name to lookupMediaType?

if (result != null) return result;
if (_useDefault) {
result = defaultExtensionMap[ext];
result = extensionToMime[ext];
if (result != null) return result;
}
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be reduced to:

 return _extensionToMime[ext] ?? (_useDefault ? extensionToMime[ext] : null);


for (final entry in _extensionToMime.entries) {
if (entry.value == mimeType) {
return entry.key;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns the first entry with that media type as value.
Is that guaranteed to be the canonical one? (Probably, if it's a linked hash map and the canonical one was added first. But might be worth mentioning.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm certain I saw wording to that effect in one of the mime registries I was looking at, but don't see that in the docs for the apache httpd mime list. Their mime registry may not have an idea of a canonical file ext. :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be the android one: https://android.googlesource.com/platform/frameworks/base/+/769f04c8f03a/mime/java-res/android.mime.types

"defines a mapping from one MIME type to the first of the listed extensions,"

import 'package:test/test.dart';

void main() {
group('defaultExtensionMap', () {
test('keys are lowercase', () {
for (final key in defaultExtensionMap.keys) {
for (final key in extensionToMime.keys) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider not changing the name to "Mime" instead of "Media", when the original name wasn't called "Mime". It's a free "Media"!

final response = await http.get(Uri.parse(mimeTypesUrl));

final lines = response.body
.split('\n')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use LineSplitter.


final wsRegex = RegExp(' +');
for (var line in lines) {
line = line.replaceAll('\t', ' ').replaceAll(wsRegex, ' ');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's can just be line = line.replaceAll(RegExp(r'[ \t]{2,}'), ' ');

'https://raw.githubusercontent.com/apache/httpd/refs/heads/trunk/docs/conf/mime.types';

print('Reading from $mimeTypesUrl ...');
final response = await http.get(Uri.parse(mimeTypesUrl));
Copy link
Member

@lrhn lrhn Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider not having the script automatically fetch from someone else's server. If we have some scheduling accident, we might start causing significant load.

Consider instead downloading the file into the repo manually, and generating from that file.
Or at least have a default mode where you generate from the downloaded file, so you have to pass a flag to download a new one.
Then you won't repeatedly download while tinkering with the script.

(That's what I do in package:characters to not repeatedly fetch from the Unicode site. It requires human action to actually reach out to a server.)

for (var line in lines) {
line = line.replaceAll('\t', ' ').replaceAll(wsRegex, ' ');

final segments = line.split(' ');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we can combine with this too, gettting final segments = line.split(RegExp(r'[\t ]+'));

| `audio/midi` | `mid` | `kar`, `midi`, `rmi` |
| `audio/mp4` | `m4a` | `m4b`, `mp4a` |
| `audio/mp4` | `m4a` | `mp4a` |
| `audio/mpeg` | `mpga` | `m2a`, `m3a`, `mp2`, `mp2a`, `mp3` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #2048
We might not want to trust that the first extension is the best default extension.
They seem to be ordered alphabetically, so the file is a mapping from extension to file type, not the other direction.

For some file types, some extensions are definitely better as default than others, like mp3 or ogg.
For others, it doesn't even make sense to have a default. See fx application/x-msdownload - those are completely different kinds of files, and assigning the wrong extension would be a mistake, but to a web server, they are all things you want to download on Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package:mime type-infra A repository infrastructure change or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants