Problem with New York Times stories #76

gautamh · 2017-06-03T04:15:01Z

The text field is empty when running unfluff on the html from a New York Times story. For example, if I request a story from nytimes.com in the node console and then pass the page html to unfluff, the returned text field is empty:

request({uri: 'https://www.nytimes.com/2017/06/01/climate/trump-paris-climate-agreement.html', jar: true}, function(e, r, b) {
  console.log(unfluff(b));
});

Result:

{ title: 'Trump Will Withdraw U.S. From Paris Climate Agreement', softTitle: 'Trump Will Withdraw U.S. From Paris Climate Agreement', date: '2017-06-01T14:48:08-04:00', author: [ 'Michael D. Shear', 'https://www.nytimes.com/by/michael-d-shear' ], publisher: undefined, copyright: '2017 The New York Times Company', favicon: 'https://static01.nyt.com/favicon.ico', description: 'The withdrawal process could take four years to complete, meaning a final decision would be up to the American voters in the next presidential election.', keywords: 'United Nations Framework Convention on Climate Change,Trump Donald J,United States Politics and Government,Global Warming', lang: 'en', canonicalLink: 'https://www.nytimes.com/2017/06/01/climate/trump-paris-climate-agreement.html', tags: [], image: 'https://static01.nyt.com/images/2017/06/02/us/02climatesub-alpha1/02climatesub-alpha1-facebookJumbo.jpg', videos: [], links: [], text: '' }

I've tried a couple of different Times urls and ensured that the request method is indeed passing the correct page html to the callback.

The text was updated successfully, but these errors were encountered:

ageitgey · 2017-06-05T20:28:57Z

I looked at the html for that page. Unfortunately the nytimes is doing a lot that makes it hard to grab their content in a generic way.

Here's a sample of some of the html:

                         <h2 class="interactive-headline">
                            The U.S. Is the Biggest Carbon Polluter in History. It Just Walked Away From the Paris Climate Deal.            
                          </h2>
                          <p class="interactive-summary">
                            The United States has emitted more planet-warming carbon dioxide into the atmosphere than any other country. Now it is walking back a promise to lower emissions.            
                          </p>
                        </figcaption>
                        <div class="interactive-image-container">
                          <div class="interactive-image">
                            <img src="https://static01.nyt.com/images/2017/05/27/climate/the-us-led-the-world-in-carbon-emissions-its-falling-behind-on-solutions-1495840402762/the-us-led-the-world-in-carbon-emissions-its-falling-behind-on-solutions-1495840402762-master495-v3.jpg" />
                          </div>
                          <div class="interactive-overlay">
                            <i class="icon sprite-icon interactive-overlay-icon"></i>
                          </div>
                        </div>
                      </a>
                    </figure>
                    <p class="story-body-text story-content" data-para-count="230" data-total-count="6521">On Twitter, Miguel Arias Cañete, the European Union’s commissioner for climate, said that “today’s announcement has galvanized us rather than weakened us, and this vacuum will be filled by new broad committed leadership.”</p>
                    <div id="story-ad-3" class="story-ad ad ad-placeholder nocontent robots-nocontent  ad-aggro_4-4-8 ad-aggro_4-5-7">
                      <div class="accessibility-ad-header visually-hidden">
                        <p>Advertisement</p>
                      </div>
                      <a class="visually-hidden skip-to-text-link" href="#story-continues-12">Continue reading the main story</a>
                    </div>

They are breaking up the text into such small bits that unfluff never finds big chunks of text that look "real" to grab so it doesn't find anything. To make unfluff work for this site would require lots of custom code that would only apply here which I don't really want to do.

For this case, I'd recommend writing some custom code to parse the nytimes.

Here's some simple node.js code that will grab the contents of the story:

const fs  = require('fs');
const cheerio = require('cheerio');

const html = fs.readFileSync('story.html').toString();
const $ = cheerio.load(html);
const textTags = $('.story-body-text');
const storyText = textTags.text();

console.log(storyText);

When I run that, I get:

$ node nytimes.js 

WASHINGTON — President Trump announced on Thursday that the United States would withdraw from the Paris climate accord, weakening efforts to combat global warming and embracing isolationist voices in his White House who argued that the agreement was a pernicious threat to the economy and American sovereignty.In a speech from the Rose Garden, Mr. Trump said the landmark 2015 pact imposed wildly unfair environmental standards on American businesses and workers. He vowed to stand with the people of the United States against what he called a “draconian” international deal.“I was elected to represent the citizens of Pittsburgh, not Paris,” the president said, drawing support from members of his Republican Party but widespread condemnation from political leaders, business executives and environmentalists around the globe.Mr. Trump’s decision to abandon the agreement for environmental action signed by 195 nations is a remarkable rebuke to heads of state, climate activists, corporate executives and members of the president’s own staff, who all failed to change his mind with an intense, last-minute lobbying blitz. The Paris agreement was intended to bind the world community into battling rising temperatures in concert, and the departure of the Earth’s second-largest polluter is a major blow.Mr. Trump said he wanted to negotiate a better deal for the United States, and the administration said he had placed calls to the leaders of Britain, France, Germany and Canada to personally explain his decision. A statement from the White House press secretary said the president “reassured the leaders that America remains committed to the trans-Atlantic alliance and to robust efforts to protect the environment.”But within minutes of the president’s remarks, the leaders of France, Germany and Italy issued a joint statement saying that the Paris climate accord was “irreversible” and could not be renegotiated.The decision was a victory for Stephen K. Bannon, Mr. Trump’s chief strategist, and Scott Pruitt, the Environmental Protection Agency administrator, who spent months quietly making their case to the president about the dangers of the agreement. Inside the West Wing, the pair overcame intense opposition from other top aides, including Gary D. Cohn, the director of the National Economic Council, the president’s daughter Ivanka Trump, and his secretary of state, Rex Tillerson.Ms. Trump, in particular, fought to make sure that her father heard from people supportive of the agreement, setting up calls and meetings with world leaders, corporate executives and others. But by Thursday, aides who pushed to remain part of the agreement were disconsolate, and it was Mr. Pruitt whom the president brought up for victory remarks at the Rose Garden event.The president’s speech was his boldest and most sweeping assertion of an “America first” foreign policy doctrine since he assumed office four months ago. He vowed to turn the country’s empathy inward, rejecting financial assistance for pollution controls in developing nations in favor of providing help to American cities struggling to hire police officers.“It would once have been unthinkable that an international agreement could prevent the United States from conducting its own domestic affairs,” Mr. Trump said.In Mr. Trump’s view, the Paris accord represents an attack on the sovereignty of the United States and a threat to the ability of his administration to reshape the nation’s environmental laws in ways that benefit everyday Americans.“At what point does America get demeaned? At what point do they start laughing at us as a country?” Mr. Trump said. “We don’t want other leaders and other countries laughing at us anymore. And they won’t be.”But business leaders like Elon Musk of Tesla, Jeffrey R. Immelt of General Electric and Lloyd C. Blankfein of Goldman Sachs said the decision would ultimately harm the economy by ceding the jobs of the future in clean energy and technology to overseas competitors.Mr. Musk, who had agreed to be a member of a two business-related councils that Mr. Trump set up this year, wrote on Twitter that he would leave those panels.“Climate change is real. Leaving Paris is not good for America or the world,” he said.Under the accord, the United States had pledged to cut its greenhouse gas emissions 26 to 28 percent below 2005 levels by 2025 and commit up to $3 billion in aid for poorer countries by 2020.By stepping away from the Paris agreement, the president made good on a campaign promise to “cancel” an agreement he repeatedly mocked at rallies. As president, he has moved rapidly to reverse Obama-era policies aimed at allowing the United States to meet its pollution-reduction targets as set under the agreement.“We are getting out,” Mr. Trump said Thursday. “But we will start to negotiate, and we will see if we can make a deal that’s fair. And if we can, that’s great.”In his remarks, Mr. Trump listed sectors of the United States economy that would lose revenue and jobs if the country remained part of the accord, citing a study — vigorously disputed by environmental groups — asserting that the agreement would cost 2.7 million jobs by 2025.But he will stick to the withdrawal process laid out in the Paris agreement, which President Barack Obama joined and most of the world has already ratified. That could take nearly four years to complete, meaning a final decision would be up to the American voters in the next presidential election.Republican lawmakers hailed Mr. Trump’s decision, calling it a necessary antidote to the overreach of Mr. Obama’s policies aimed at reducing planet-warming carbon emissions.“I applaud President Trump and his administration for dealing yet another significant blow to the Obama administration’s assault on domestic energy production and jobs,” said Senator Mitch McConnell of Kentucky, the majority leader.But Mr. Trump’s call for new global negotiations about the planet’s climate drew derision from Democrats in the United States and other heads of state.President Emmanuel Macron of France and Prime Minister Justin Trudeau of Canada each issued rebukes to Mr. Trump. “Make our planet great again,” Mr. Macron said.On Twitter, Miguel Arias Cañete, the European Union’s commissioner for climate, said that “today’s announcement has galvanized us rather than weakened us, and this vacuum will be filled by new broad committed leadership.”Mr. Obama, in a rare assertion of his political views as a former president, said, “The nations that remain in the Paris agreement will be the nations that reap the benefits in jobs and industries created.”“Even in the absence of American leadership; even as this administration joins a small handful of nations that reject the future; I’m confident that our states, cities, and businesses will step up and do even more to lead the way, and help protect for future generations the one planet we’ve got,” Mr. Obama said.In recent days, Mr. Trump withstood withering criticism from European counterparts who accused him of shirking America’s role as a global leader and America’s responsibility as history’s largest emitter of planet-warming greenhouse gasses.After a fierce debate inside the administration, the White House on Thursday took on the trappings of a celebration. The Rose Garden was packed with reporters, activists and members of Mr. Trump’s administration. Scores of staff members lined the sides of the Rose Garden as a military band played soft jazz.Supporters of the Paris agreements reacted with pent-up alarm, condemning the administration for shortsightedness about the planet and a reckless willingness to shatter longstanding diplomatic relationships.“Removing the United States from the Paris agreement is a reckless and indefensible action,” said Al Gore, the former vice president who has become an evangelist for fighting climate change. “It undermines America’s standing in the world and threatens to damage humanity’s ability to solve the climate crisis in time.”Corporate leaders also condemned Mr. Trump’s action.On its website, I.B.M. reaffirmed its support for the Paris agreement and took issue with the president’s contention that it was a bad deal for American workers and the American economy.“This agreement requires all participating countries to put forward their best efforts on climate change as determined by each country,” the company said. “I.B.M. believes that it is easier to lead outcomes by being at the table, as a participant in the agreement, rather than from outside it.”Mr. Immelt, the chairman and chief executive of General Electric, took to Twitter to say he was “disappointed” with the decision. “Climate change is real,” he said. “Industry must now lead and not depend on government.”But Mr. Trump was resolute.“It is time to put Youngstown, Ohio; Detroit, Mich.; and Pittsburgh, Pa., along with many, many other locations within our great country, before Paris, France,” he said. “It is time to make America great again.”The mayor of Pittsburgh, Bill Peduto, responded on Twitter, “I can assure you that we will follow the guidelines of the Paris Agreement for our people, our economy & future.”

justinmchase · 2018-08-10T01:22:25Z

I came here to report this as well.
https://www.nytimes.com/2018/08/08/business/elon-musk-tesla-sec.html

This only extracts about 200 words of this article.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with New York Times stories #76

Problem with New York Times stories #76

gautamh commented Jun 3, 2017

ageitgey commented Jun 5, 2017

justinmchase commented Aug 10, 2018

Problem with New York Times stories #76

Problem with New York Times stories #76

Comments

gautamh commented Jun 3, 2017

ageitgey commented Jun 5, 2017

justinmchase commented Aug 10, 2018