Skip to content

Commit 6b7664a

Browse files
committed
Add new seminar
1 parent 3121214 commit 6b7664a

File tree

2 files changed

+74
-1
lines changed

2 files changed

+74
-1
lines changed

content/event/2024-10-03/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ all_day: false
2424
# Schedule page publish date (NOT event date).
2525
publishDate: 2024-09-27T00:00:00Z
2626

27-
authors: [camachocolladosj]
27+
authors: [ousidhoumn]
2828
tags: []
2929

3030
# Is this a featured event? (true/false)

content/event/2024-10-10/index.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
---
2+
# Documentation: https://wowchemy.com/docs/managing-content/
3+
4+
title: "Seminar: \"Natural Experiments in NLP and Where to Find Them\""
5+
# event:
6+
# event_url:
7+
location: Abacws
8+
# address:
9+
# street:
10+
# city:
11+
# region:
12+
# postcode:
13+
# country:
14+
summary: Talk by [Pietro Lesci](https://pietrolesci.github.io/) (University of Cambridge)
15+
abstract: "In training language models, training choices—such as the random seed for data ordering or the token vocabulary size—significantly influence model behaviour. Answering counterfactual questions like \"How would the model perform if this instance were excluded from training?\" is computationally expensive, as it requires re-training the model. Once these training configurations are set, they become fixed, creating a \"natural experiment\" where modifying the experimental conditions incurs high computational costs. Using econometric techniques to estimate causal effects from observational studies enables us to analyse the impact of these choices without requiring full experimental control or repeated model training. In this talk, I will present our paper, *Causal Estimation of Memorisation Profiles* (Best Paper Award at ACL 2024), which introduces a novel method based on the difference-in-differences technique from econometrics to estimate memorisation without requiring model re-training. I will also discuss preliminary results from ongoing work that applies the regression discontinuity design to estimate the causal effect of selecting a specific vocabulary size."
16+
17+
# Talk start and end times.
18+
# End time can optionally be hidden by prefixing the line with `#`.
19+
date: 2024-10-10T13:00:00Z
20+
date_end: 2024-10-10T14:00:00Z
21+
all_day: false
22+
23+
# Schedule page publish date (NOT event date).
24+
publishDate: 2024-10-04T00:00:00Z
25+
26+
authors: [camachocolladosj]
27+
tags: []
28+
29+
# Is this a featured event? (true/false)
30+
featured: false
31+
32+
# Featured image
33+
# To use, add an image named `featured.jpg/png` to your page's folder.
34+
# Focal points: Smart, Center, TopLeft, Top, TopRight, Left, Right, BottomLeft, Bottom, BottomRight.
35+
image:
36+
caption: ""
37+
focal_point: ""
38+
preview_only: false
39+
40+
# Custom links (optional).
41+
# Uncomment and edit lines below to show custom links.
42+
# links:
43+
# - name: Follow
44+
# url: https://twitter.com
45+
# icon_pack: fab
46+
# icon: twitter
47+
48+
# Optional filename of your slides within your event's folder or a URL.
49+
url_slides:
50+
51+
url_code:
52+
url_pdf:
53+
url_video:
54+
55+
# Markdown Slides (optional).
56+
# Associate this event with Markdown slides.
57+
# Simply enter your slide deck's filename without extension.
58+
# E.g. `slides = "example-slides"` references `content/slides/example-slides.md`.
59+
# Otherwise, set `slides = ""`.
60+
slides: ""
61+
62+
# Projects (optional).
63+
# Associate this post with one or more of your projects.
64+
# Simply enter your project's folder or file name without extension.
65+
# E.g. `projects = ["internal-project"]` references `content/project/deep-learning/index.md`.
66+
# Otherwise, set `projects = []`.
67+
projects: []
68+
---
69+
70+
**Invited Speaker:** [Pietro Lesci](https://pietrolesci.github.io/) (University of Cambridge)
71+
72+
**Bio:**
73+
Pietro Lesci is a PhD student in Natural Language Processing at the University of Cambridge, working with Professor Andreas Vlachos. His research explores the causal effects of training choices on language models, focusing on memorisation, shortcut learning, and tokenisation. His work has been presented at major NLP conferences like ACL and NAACL. He received a Best Paper Award at ACL 2024 and funding from the Translated Imminent Research Grant for his research contributions. Pietro’s experience spans academia and industry, including roles at Amazon AWS AI Labs, Bain & Company, and Bocconi University. He holds an MSc in Economic and Social Sciences from Bocconi University.

0 commit comments

Comments
 (0)