Skip to content

Commit 4af14ac

Browse files
committed
Reordered sources in sources documentatiom
1 parent c93d87b commit 4af14ac

File tree

1 file changed

+30
-28
lines changed

1 file changed

+30
-28
lines changed

sources.md

Lines changed: 30 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,36 @@ and access towards related query data using a programmable search engine.
121121
- Data available through JSON format
122122

123123

124+
## Internet Archive
125+
126+
**Description:**
127+
The Internet Archive is a nonprofit digital library offering free access to millions of digital materials including books, movies, software, music, and websites. This project uses the Internet Archive’s Session and Search API to fetch metadata of items that reference Creative Commons licenses.
128+
129+
**API documentation link:**
130+
- [InternetArchive Tools and APIs](https://archive.org/developers/index-apis.html)
131+
- [InternetArchive: A Python Interface to archive.org](https://internetarchive.readthedocs.io/en/stable/internetarchive.html)
132+
- [The Internet Archive Python Library](https://archive.org/developers/internetarchive/)
133+
- [The Internet Archive Search API reference](https://archive.org/advancedsearch.php)
134+
- [A Python interface to archive.org.](https://pypi.org/project/internetarchive/)
135+
- [Internet Archive Python client; Session + Search Items](https://github.com/jjjake/internetarchive/tree/master/internetarchive)
136+
137+
**API information:**
138+
- No API key required
139+
- Pagination supported via rows and start parameters
140+
- Python access via internetarchive library (search_items, ArchiveSession)
141+
- Query limit: None specified, but rate-limiting may apply (1000000 max at a time)
142+
- Data available through JSON format
143+
- Retry logic and session management implemented for reliability
144+
145+
**Notes:**
146+
- This project queries for items containing `text:creativecommons.org` in their metadata.
147+
- The script extracts and normalizes license URLs and language codes
148+
- In summary, it queries licenseurl and language fields for all items containing "creativecommons.org" in their metadata
149+
- Aggregated counts are saved to CSV files for licenses and languages.
150+
- License normalization uses a canonical mapping defined in `license_url_to_identifier_mapping.csv`.
151+
- Language normalization using Babel and [iso-639](https://pypi.org/project/iso639-lang/) see [github information](https://github.com/jacksonllee/iso639), see also [iso-639 standards](https://www.loc.gov/standards/iso639-2/), you can also checkout [iso639-2](https://www.loc.gov/standards/iso639-2/php/English_list.php)
152+
153+
124154
## Openverse
125155

126156
**Description:** Openverse is a search engine for openly licensed media,
@@ -168,31 +198,3 @@ language edition of wikipedia. It runs on the Meta-Wiki API.
168198
- No API key required
169199
- Query limit: It is rate-limited only to prevent abuse
170200
- Data available through XML or JSON format
171-
172-
## Internet Archive
173-
174-
**Description:**
175-
The Internet Archive is a nonprofit digital library offering free access to millions of digital materials including books, movies, software, music, and websites. This project uses the Internet Archive’s Session and Search API to fetch metadata of items that reference Creative Commons licenses.
176-
177-
**API documentation link:**
178-
- [InternetArchive: A Python Interface to archive.org](https://internetarchive.readthedocs.io/en/stable/internetarchive.html)
179-
- [The Internet Archive Python Library](https://archive.org/developers/internetarchive/)
180-
- [The Internet Archive Search API reference](https://archive.org/advancedsearch.php)
181-
- [A Python interface to archive.org.](https://pypi.org/project/internetarchive/)
182-
- [Internet Archive Python client; Session + Search Items](https://github.com/jjjake/internetarchive/tree/master/internetarchive)
183-
184-
**API information:**
185-
- No API key required
186-
- Pagination supported via rows and start parameters
187-
- Python access via internetarchive library (search_items, ArchiveSession)
188-
- Query limit: None specified, but rate-limiting may apply (1000000 max at a time)
189-
- Data available through JSON format
190-
- Retry logic and session management implemented for reliability
191-
192-
**Notes:**
193-
- This project queries for items containing `text:creativecommons.org` in their metadata.
194-
- The script extracts and normalizes license URLs and language codes
195-
- In summary, it queries licenseurl and language fields for all items containing "creativecommons.org" in their metadata
196-
- Aggregated counts are saved to CSV files for licenses and languages.
197-
- License normalization uses a canonical mapping defined in `license_url_to_identifier_mapping.csv`.
198-
- Language normalization using Babel and [iso-639](https://pypi.org/project/iso639-lang/) see [github information](https://github.com/jacksonllee/iso639), see also [iso-639 standards](https://www.loc.gov/standards/iso639-2/), you can also checkout [iso639-2](https://www.loc.gov/standards/iso639-2/php/English_list.php)

0 commit comments

Comments
 (0)