data_template.json

{
  "lorebookVersion": 4,
  "entries": [
    {
      "text": "genesis of this project was so i mean, if i go back to kind of what my phd is about, i'm working on, like the intersection of fact checking and scholarly document understanding. and one of the problems with doing research in this area is there isn't a lot of data to be able to perform these tasks. so we were kind of investigating, like, what are the problems in this area? what are things that could be useful for people that we could try and, you know, start hacking away at and making some progress on and we stumbled upon these papers, which describes how press releases about scientific papers tend to exaggerate the findings of those papers, and that this has some real world consequences because they and the quote is like the dominant link between news articles about scientific papers and the scientific papers themselves are these press releases. so there's this direct link between when press release exaggerate the findings of the papers and when news articles do which has downstream consequences. this is obviously a sensitive topic these days or every president yeah, there's something that affects everybody in different domains and i'm gonna do my best to not get distracted and keep us narrowed into what you're working on which is healthcare research in particular. yeah, so the nice thing about those studies is that they actually manually labeled a lot of like a pretty good clutch of data, basically reading the papers, looking at the main findings of those papers, and then reading the press releases and looking at how they describe those main findings. and then basically label kind of the cause of claim strength. so if a paper makes a direct causal claim, like x causes y, and the press release, say it makes the same kind of x causes y claim that would be an accurate representation of what the paper said. but there are a lot of cases where the paper will say like x is related to y or x is correlated with y, but then the press release and consequently, the downstream news article will say x causes y it's like amazing new research findings. right? right, right. and a lot of people, myself included, are more apt to digest the press release. than actually click through and read the whole paper, let alone be able to understand and medical paper, etc, etc. and then you wind up with all these people like me running around thinking, oh, you know, fish oil might actually regrow hair on my head because i read the press release that yeah, yeah, exactly. so we were kind of asking is there a way that ai tools for machine learning could help detect when this happens with as little labeled data as possible? because again, finding labeled data for this problem is really difficult. so fortunately, there was this kind of 800 pairs of press releases and abstracts we can use, which is still if you want to train a model, not a lot of data, right? so we were like, can we use this as test data and come up with some techniques using say, like few shot learning to be able to kind of make progress on it before we dive into that? why? medical research, healthcare research in particular? yeah, so the fact of the matter is that this is kind of what people are interested in. and so like this past research on exaggeration was focused specifically on healthcare research. okay, so you've got these 800 i don't want to put words in your mouth to say it again. you got 800 pairs of yeah, the abstract or press release or whatever it is paired with the actual research paper. yeah. and then we kind of cleaned and curated this data to get kind of the most high quality data that we could so like the best signal, so that took it down to about 663 pairs 100, which we can use as training data. and the rest we kind of hold out as test data. so now we have this problem where okay, we have these 100 samples of some press release, exaggerating some accurately reflecting and some downplaying the findings of the papers. how can we use this kind of really tiny dataset? to be able to perform this task, right? so as you do, we read a bunch of papers. and there's this kind of emerging paradigm in natural language processing. i mean, there's been a paradigm shift in the past few years, where we're using these kind of large pre trained language models. so very large transformer models. if you're familiar with them, which are trained on these massive corporate text, and which have been shown to be really useful for a lot of downstream tasks, so you can fine tune them on a variety of tasks, and they perform really well. what's the timeframe? when did you start working on the project? so we started at the beginning of this year, so probably around january 2020. what this is all hot off the presses with the most recent language, data, everything. okay, great. yeah, exactly. so there was this paradigm shift. and people were kind of doing classic supervised learning on these with these models on various downstream tasks and getting sort of performance with these models, but there's been kind of another more recent shift, where people have been asking, like, can we use these language model, appreciating objectives, and some clever ways to be able to do few shot learning? so there have been these methods that are critical prompt based learning methods? there's kind of two ways to do this. one notable example is as well called gpt. three, which is this massive language model trained by open ai that basically you can get at these these tiny prompts. so like, a description of what the task is and two examples and be able to do all these downstream inference on these tasks. so like one example of that is like, give it a description of a webpage and say give me a webpage and give like two examples. and then you can have your model basically generate web pages for you based on description. so that's one way of doing this. and then another way, which is kind of what we were looking into, is this parent exploiting training, which basically what that's doing is instead of having the specification objective, where you have some input texts, and like some label, which you know, means something and which you train your model to predict. you're basically saying, can we leverage language and kind of that learn patterns that the language model has picked up on? from mass language model pre training, and be able to do classification with basically actual text? so what you do is you kind of develop these prompts. so in our case, we could have our input text could be like chocolate causes happiness, and we could create a prompt that says like, scientists say chocolate causes happiness, the claim strength is like, and you're basically training your model to predict an actual word for what that kind of like masked position is. so you can have like, causal or conditional or correlational or in or non causal could be your your kind of quote unquote labels which are now actual tokens in your vocabulary. so we were kind of going from from this perspective of prop based learning with with power explained trading, and seeing what are some kinds of improvements we can do on that in this space. so specifically working with exaggeration, so we thought okay, we actually have two different complementary tasks, we have the task of actually predicting exaggeration, which is something that takes you know two pieces of text and says, okay, this piece of text is exaggerating this one. and then we also have this other task of claim strike so like this claim is causal claim and this claim is a correlational claim. and that could be informative for something like unsaturation. so basically, just being a single task, when you train this model, you have these complimentary tasks correlate of actually predicting exaggeration, which is something that takes you know two pieces of text and says, okay, this piece of text is exaggerating this, kind of, instead of just doing a single task, when you train this model, you have these complimentary tasks. so in our case, one task per exaggeration, and one task for claim strength prediction. and we saw that if you if you do this, you can actually get performance so improved ability to predict the x axis or that sentence one is exaggerating sentence two, for example, what is few shot learning what is what is that? so if we think about how machine learning models are generally trained, usually you need kind of massive amounts of labeled data to be able to train them well. few shot learning is kind of asking how little data can we get away with to perform well on this task? so the few shots are like a few examples, more or less got it. okay. how did it go? what did you find? how much are people exaggerating and how did you what was the process like of honing in on figuring all that out? i mean, the literature shows that about 33% of the time, press releases will exaggerate scientific papers, and as a result, that means about 33% of news articles exaggerate the findings in scientific papers, not to sound like a cranky ex journalist but that number actually is lower than i would. really. multiple people told me that yeah. it's really interesting that journalists are the ones that are but i mean, i think it makes sense because the incentives in journalism are to get eyes on your on your article. i mean, there's so much that goes into it, but yeah, i don't want to derail you. so about 33% are already exaggerating and did your did your model, match up predict or grok that accurately so we didn't actually do that experiment. so i mean, there's previous work that has done that, that on like a bunch of unlabeled data. yeah. okay. we were more focused on cars. yeah. so that's been done before, right. it's like train a model for like causal claims, strength prediction. and then, on a bunch of unlabeled data, see what the percentages of exaggeration, but we were more interested in kind of curating this dataset. so that we could actually benchmark the performance and compare methods so that we can kind of gauge progress on this task, as opposed to kind of say, okay, we trained a model, and we've tested on some unlabeled data. and this is what our model says is the rate of exaggeration right? which could be correct or could not be correct. i mean, when you're doing that you're basically trusting that your model is as accurate as possible without actually knowing how accurate is kind of how did you kind of walk through you know, sort of tweaking and and and honing in on on your model and your process to find to neuron results. so the interesting thing, actually, with this kind of prompts based learning approach is that it's much less about like tweaking models and more about finding kind of what are the best prompts and what are the best what are called verbalizations of your labels. so we actually spent a good amount of time of trying to hone in on what is the pattern that a language model would be able to understand it's trying to get at like exaggeration or relating straight. so we go through this iterative process of like, okay, let's try just like a like not even a human interpretable kind of pattern where we just put like, in brackets of mask token and try to click that mask token is and then after that, put the input text select chocolate cos happiness. and we kind of go through this iterative approach where we kind of build these out a bit better, and then kind of test on some training data to see okay, are we doing better gauging the performance of these these prompts more or less, and then at the same time, using these techniques, so there's another method that came out from same people that didn't have what's called petal which is a way to using your training data, find what are the best verbalizations of the labels that you're interested in? so it's like you come up, we come up with a prompt like for exaggeration, like scientists say and then what's the text from the paper? in contrast, journalists say, blank and then half the text from the press release. and then you're more or less probing the language model for what it sees as like kind of logical labels for the mass of tokens. so i mean, we actually found that the language model would come up kind of sensible tokens to replace these labels. so for example, for exact exaggerates, it would come up with like mistaken wrong hollow, naive false lies, for downplaying it would come up with like preliminary uncertain, hypothetical. so it's really interesting because you're actually also probing the language model for kind of these patterns and it's picked up from presenting, which as a human kind of reading this makes sense, you know, what the task is, which is reflecting if something is exaggerating something else. and so i don't mean to, you know, ask you to describe, you know, here's our audience, what's wrong with journalism or what's wrong with you know, social media or whatever. but as i'm listening, you talk in particular about, you know, the probing the language patterns and the prompts and everything. i'm thinking like, oh, man, if we just get the ai good enough, right. he's a gross blanket term, sorry, but if we get this system you know, to be sort of accurate and good enough, then it can write press releases that are more accurate and do a better job of summarizing. and so then, you know, i'm thinking, well, okay, why aren't humans doing this? and it's like, well, there are all these incentives to drive traffic or perhaps there's an agenda that not necessarily is nefarious, but people have agendas, conscious or not, et cetera. and people are subjected to the incentive structure, whatever they're doing. exactly, exactly. so i mean, what on sort of the, i don't know, human or societal side of things. is there kind of a learning that that you and isabel are kind of walking away from this with, as far as i know, is it that the incentives are misaligned? is it that reporters maybe don't have the time or the interest or maybe the back or the technical background to read the research reports and understand them so they rely on these press releases? you know, is there something kind of on the sort of human workflow side of things that jumped out to you as being a pain point we could address? to me the biggest issue is the incentive structure. yeah. and it's not just people that write about science that do this kind of thing. it's the scientists themselves can sometimes do this because the incentives in academia and in science in general aren't published. there's a whole saying of publish or perish, which is an unfortunate, i don't know if it's a truth, but it's an unfortunate thing that people have to reckon with. so i think the incentive structures are set up so that this kind of thing happens. i'm definitely not an expert on kind of how to change those incentive structures. my hope is that and maybe this is a pie in the sky hope is that there would be good actors who would do their due diligence and do basic things like talk to the scientists when they write the press releases, or when they write news articles to kind of actually read the research and talk to the people that did the research and try to give the most accurate reflection of what was done, which is kind of what i would hope we're not there yet. this is progress towards a system that could assist, for example, journalists and ensuring that they're getting accurate reporting on science. so if i could kind of dream of where this project could go in like five years, for example, maybe even less time than that. i would, i would hope that this could be something that can turn into say, like a tool that science journalist could use to, before publishing run their own fact checks, right in lieu of having a fact checker right to just very basic like, am i accurately representing what the science is? i'm just imagining, you know, a highlighted area in my document and it's saying, you know, this claim is as suspicious but you know, we're flagging this claim. are you perhaps being naive, hollow and authentic? see, this is the other thing is, is this we need to work with people who do like hci research for example, to figure out like, what are the right interventions? totally, and what is the best way to prompt for example, a journalist to kind of check that we're no totally and i do a lot of writing, you know, these days, it's more on kind of the marketing and communication side of things then then journalism, but i've noticed over the past year, especially when i'm doing not technical like computer science, but technical writing and involves lingo to a particular domain that some of the auto complete systems in, you know, online writing tools have just become more and more accurate in terms of predicting, not just how i might want to end my sentence so that it makes sense, but getting like the technical lingo and sort of using it in a very accurate is really the word that comes to mind, right? because it's you know, you get into these distinctions between you know, in our case, right, like pattern explained training and, you know, a similar but different technique and getting into the nitty gritty of being able to suggest a completion. obviously, you can flip that around like you're saying and run it as sort of a fact checker type system after the fact. yeah, these are the applications that really excite me in machine learning and artificial intelligence are these things that are kind of tasks that would take a person a long time and a lot of kind of cognitive work to be able to do, but that just speed up our work and are used in tandem with people. human in the loop. yeah. yeah, exactly. and particularly with like fact checking to do things which have the potential for public good, right. i'm speaking with dustin right. dustin is a phd student in the department of computer science at the university of copenhagen in denmark. and he and his colleague isabel augustine, have a new paper out in which they explain how they use gpus to train an exaggeration detection system to identify claims that might be you know, a little over enthusiastic because the word we're using in health science reporting in particular, justin mentioned that you are in denmark in copenhagen right now, as we record this, i'm not here in california in the us. i'm wondering as to ask like a big dumb question. how'd you get there? how'd you wind up in copenhagen? how'd you wind up studying what you're studying and working on nlp and exaggeration detection and all that good stuff. i'm originally from san diego. and i did my master's at uc san diego. and, i mean, i had been wanting to do a phd since i was in my bachelor's and so i was like 20. which okay, 20 is not that young, but still it's been a while now. now. half your life at that point. so you know. yeah, so i was doing my masters at uc san diego and kind of finding my research niche and ended up getting involved in starting a computer science master's for yeah, so i was doing my master's in computer science and was kind of exploring kind of what areas of research can be interesting to work on. sure. and i ended up working on this project for my thesis that was about biomedical nlp. so other background and maybe this is also because i'm from the us i have this kind of interest in like conspiracy theories and kind of the psychology of how people believe certain things right? this started out as kind of like a casual fun like, oh, yeah, people that believe in flutter for kind of interesting. like that's, that's an interesting conspiracy. i mean, nowadays it's a bit i'm laughing because i'm afraid if i start crying on the on the show it's gonna drive listeners away. so yeah, but yeah, yeah. so i always have kind of an interest in that and kind of why people believe these things and kind of how could there be like interventions to say bring people back from them sure, prevent people from falling prey to these kinds? of conspiracies. and so the way i ended up in copenhagen and working on scientific fact checking is actually so isabel. she's my advisor. i followed her on twitter and happened to see her posts about the position one day and then did some research on like, kind of what she like what her background is, was what kind of research she did. and she's been working on problems in fact, checking in scientific document processing for a while now. cool, okay, so it just seemed like a perfect match. so i just messaged her and was like, hey, i'm interested. so i mean, that's more or less how i i found her and yeah, so how long ago did you move from california to denmark? yes, it's been about two years and to veer off to, you know, on, unrelated to computer science, per se, but super interesting. how do you find it? how's that? you know, i'm guessing there are fewer conspiracy theorists. yeah, copenhagen and san diego. a stark contrast. yeah, this is my impression and bear in mind, i'm still learning danish. so there's probably some some news and stuff that i don't understand quite yet. but people here tend to in my impression, trust the government quite a lot compared to people in the us, right. so i can give some examples like, so we were pretty late in mandating masks during a pandemic.",
      "contextConfig": {
        "prefix": "",
        "suffix": "\n",
        "tokenBudget": 2048,
        "reservedTokens": 0,
        "budgetPriority": 400,
        "trimDirection": "trimBottom",
        "insertionType": "newline",
        "maximumTrimType": "sentence",
        "insertionPosition": -1
      },
      "lastUpdatedAt": 1666846259188,
      "displayName": "genesis of...",
      "id": "103992058176613542042002759608477372172",
      "keys": [],
      "searchRange": 10000,
      "enabled": true,
      "forceActivation": false,
      "keyRelative": false,
      "nonStoryActivatable": false,
      "category": "",
      "loreBiasGroups": [
        {
          "phrases": [],
          "ensureSequenceFinish": false,
          "generateOnce": true,
          "bias": 0,
          "enabled": true,
          "whenInactive": false
        }
      ]
    }
  ],
  "settings": {
    "orderByKeyLocations": false
  },
  "categories": []
}