Class: NVDFeedScraper

Inherits:
Object
  • Object
show all
Includes:
NvdFeedApi
Defined in:
lib/nvd_feed_api.rb,
lib/nvd_feed_api/feed.rb,
lib/nvd_feed_api/meta.rb

Overview

The class that parse NVD website to get information.

Examples:

Initialize a NVDFeedScraper object, get the feeds and see them:

scraper = NVDFeedScraper.new
scraper.scrap
scraper.available_feeds
scraper.feeds
scraper.feeds("CVE-2007")
cve2007, cve2015 = scraper.feeds("CVE-2007", "CVE-2015")

Defined Under Namespace

Classes: Feed, Meta

Constant Summary collapse

BASE =
'https://nvd.nist.gov'.freeze
URL =

The NVD url where is located the data feeds.

"#{BASE}/vuln/data-feeds".freeze

Constants included from NvdFeedApi

NvdFeedApi::VERSION

Instance Method Summary collapse

Constructor Details

#initializeNVDFeedScraper

Initialize the scraper



28
29
30
31
# File 'lib/nvd_feed_api.rb', line 28

def initialize
  @url = URL
  @feeds = nil
end

Instance Method Details

#available_cvesArray<String>

Return a list with the name of all available CVEs in the feed. Can only be called after #scrap.

Returns:

  • (Array<String>)

    List with the name of all available CVEs. May return tens thousands CVEs.



291
292
293
294
295
296
297
298
299
300
301
302
303
# File 'lib/nvd_feed_api.rb', line 291

def available_cves
  cve_names = []
  feed_names = available_feeds
  feed_names.delete('CVE-Modified')
  feed_names.delete('CVE-Recent')
  feed_names.each do |feed_name|
    f = feeds(feed_name)
    f.json_pull
    # merge removing duplicates
    cve_names |= f.available_cves
  end
  return cve_names
end

#available_feedsArray<String>

Return a list with the name of all available feeds. Returned feed names can be use as argument for #feeds method. Can only be called after #scrap.

Examples:

scraper.available_feeds => ["CVE-Modified", "CVE-Recent", "CVE-2017", "CVE-2016", "CVE-2015", "CVE-2014", "CVE-2013", "CVE-2012", "CVE-2011", "CVE-2010", "CVE-2009", "CVE-2008", "CVE-2007", "CVE-2006", "CVE-2005", "CVE-2004", "CVE-2003", "CVE-2002"]

Returns:

  • (Array<String>)

    List with the name of all available feeds.



132
133
134
135
136
137
138
139
140
# File 'lib/nvd_feed_api.rb', line 132

def available_feeds
  raise 'call scrap method before using available_feeds method' if @feeds.nil?

  feed_names = []
  @feeds.each do |feed| # feed is an objet
    feed_names.push(feed.name)
  end
  feed_names
end

#cve(cve) ⇒ Hash #cve(cve_arr) ⇒ Array #cve(cve, *) ⇒ Array

TODO:

implement a CVE Class instead of returning a Hash. May not be in the same order as provided.

Note:

#scrap is needed before using this method.

Search for CVE in all year feeds.

Examples:

s = NVDFeedScraper.new
s.scrap
s.cve("CVE-2014-0002", "cve-2014-0001")

Overloads:

  • #cve(cve) ⇒ Hash

    One CVE.

    Parameters:

    • cve (String)

      CVE ID, case insensitive.

    Returns:

    • (Hash)

      a Ruby Hash corresponding to the CVE.

  • #cve(cve_arr) ⇒ Array

    An array of CVEs.

    Parameters:

    • cve_arr (Array<String>)

      Array of CVE ID, case insensitive.

    Returns:

    • (Array)

      an Array of CVE, each CVE is a Ruby Hash. May not be in the same order as provided.

  • #cve(cve, *) ⇒ Array

    Multiple CVEs.

    Parameters:

    • cve (String)

      CVE ID, case insensitive.

    • * (String)

      As many CVE ID as you want.

    Returns:

    • (Array)

      an Array of CVE, each CVE is a Ruby Hash.

See Also:



164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
# File 'lib/nvd_feed_api.rb', line 164

def cve(*arg_cve)
  return_value = nil
  raise 'no argument provided, 1 or more expected' if arg_cve.empty?

  if arg_cve.length == 1
    case arg_cve[0]
    when String
      raise 'bad CVE name' unless /^CVE-[0-9]{4}-[0-9]{4,}$/i.match?(arg_cve[0])

      year = /^CVE-([0-9]{4})-[0-9]{4,}$/i.match(arg_cve[0]).captures[0]
      matched_feed = nil
      feed_names = available_feeds
      feed_names.delete('CVE-Modified')
      feed_names.delete('CVE-Recent')
      feed_names.each do |feed|
        if /#{year}/.match?(feed)
          matched_feed = feed
          break
        end
      end
      # CVE-2002 feed (the 1st one) contains CVE from 1999 to 2002
      matched_feed = 'CVE-2002' if matched_feed.nil? && ('1999'..'2001').to_a.include?(year)
      raise "bad CVE year in #{arg_cve}" if matched_feed.nil?

      f = feeds(matched_feed)
      f.json_pull
      return_value = f.cve(arg_cve[0])
    when Array
      raise 'one of the provided arguments is not a String' unless arg_cve[0].all? { |x| x.is_a?(String) }
      raise 'bad CVE name' unless arg_cve[0].all? { |x| /^CVE-[0-9]{4}-[0-9]{4,}$/i.match?(x) }

      return_value = []
      # Sorting CVE can allow us to parse quicker
      # Upcase to be sure include? works
      cves_to_find = arg_cve[0].map(&:upcase).sort
      feeds_to_match = Set[]
      cves_to_find.each do |cve|
        feeds_to_match.add?(/^(CVE-[0-9]{4})-[0-9]{4,}$/i.match(cve).captures[0])
      end
      feed_names = available_feeds.to_set
      feed_names.delete('CVE-Modified')
      feed_names.delete('CVE-Recent')
      # CVE-2002 feed (the 1st one) contains CVE from 1999 to 2002
      virtual_feeds = ['CVE-1999', 'CVE-2000', 'CVE-2001']
      # So virtually add those feed...
      feed_names.merge(virtual_feeds)
      raise 'unexisting CVE year was provided in some CVE' unless feeds_to_match.subset?(feed_names)

      matched_feeds = feeds_to_match.intersection(feed_names)
      # and now that the intersection is done remove those virtual feeds and add CVE-2002 instead if needed
      unless matched_feeds.intersection(virtual_feeds.to_set).empty?
        matched_feeds.subtract(virtual_feeds)
        matched_feeds.add('CVE-2002')
      end
      feeds_arr = feeds(matched_feeds.to_a)
      feeds_arr.each do |feed|
        feed.json_pull
        cves_obj = feed.cve(cves_to_find.select { |cve| cve.include?(feed.name) })
        case cves_obj
        when Hash
          return_value.push(cves_obj)
        when Array
          return_value.push(*cves_obj)
        else
          raise 'cve() method of the feed instance returns wrong value'
        end
      end
    else
      raise "the provided argument (#{arg_cve[0]}) is nor a String or an Array"
    end
  else
    # Overloading a list of arguments as one array argument
    return_value = cve(arg_cve)
  end
  return return_value
end

#feedsArray<Feed> #feeds(feed) ⇒ Feed #feeds(feed_arr) ⇒ Array<Feed> #feeds(feed, *) ⇒ Array<Feed>

Return feeds. Can only be called after #scrap.

Examples:

scraper.feeds # => all feeds
scraper.feeds('CVE-2010') # => return only CVE-2010 feed
scraper.feeds("CVE-2005", "CVE-2002") # => return CVE-2005 and CVE-2002 feeds

Overloads:

  • #feedsArray<Feed>

    All the feeds.

    Returns:

    • (Array<Feed>)

      Attributes of all feeds. It's an array of Feed object.

  • #feeds(feed) ⇒ Feed

    One feed.

    Parameters:

    • feed (String)

      Feed name as written on NVD website. Names can be obtains with #available_feeds.

    Returns:

    • (Feed)

      Attributes of one feed. It's a Feed object.

  • #feeds(feed_arr) ⇒ Array<Feed>

    An array of feeds.

    Parameters:

    • feed_arr (Array<String>)

      An array of feed names as written on NVD website. Names can be obtains with #available_feeds.

    Returns:

    • (Array<Feed>)

      Attributes of the feeds. It's an array of Feed object.

  • #feeds(feed, *) ⇒ Array<Feed>

    Multiple feeds.

    Parameters:

    • feed (String)

      Feed name as written on NVD website. Names can be obtains with #available_feeds.

    • * (String)

      As many feeds as you want.

    Returns:

    • (Array<Feed>)

      Attributes of the feeds. It's an array of Feed object.

See Also:



87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
# File 'lib/nvd_feed_api.rb', line 87

def feeds(*arg_feeds)
  raise 'call scrap method before using feeds method' if @feeds.nil?

  return_value = nil
  if arg_feeds.empty?
    return_value = @feeds
  elsif arg_feeds.length == 1
    case arg_feeds[0]
    when String
      @feeds.each do |feed| # feed is an object
        return_value = feed if arg_feeds.include?(feed.name)
      end
      # if nothing found return nil
    when Array
      raise 'one of the provided arguments is not a String' unless arg_feeds[0].all? { |x| x.is_a?(String) }

      # Sorting CVE can allow us to parse quicker
      # Upcase to be sure include? works
      # Does not use map(&:upcase) to preserve CVE-Recent and CVE-Modified
      feeds_to_find = arg_feeds[0].map { |x| x[0..2].upcase.concat(x[3..x.size]) }.sort
      matched_feeds = []
      @feeds.each do |feed| # feed is an object
        if feeds_to_find.include?(feed.name)
          matched_feeds.push(feed)
          feeds_to_find.delete(feed.name)
        elsif feeds_to_find.empty?
          break
        end
      end
      return_value = matched_feeds
      raise "#{feeds_to_find.join(', ')} are unexisting feeds" unless feeds_to_find.empty?
    else
      raise "the provided argument (#{arg_feeds[0]}) is nor a String or an Array"
    end
  else
    # Overloading a list of arguments as one array argument
    return_value = feeds(arg_feeds)
  end
  return return_value
end

#scrapInteger

Note:

#scrap need to be called only once but can be called again to update if the NVD feed page changed.

Scrap / parse the website to get the feeds and fill the #feeds attribute.

Returns:

  • (Integer)

    Number of scrapped feeds.



36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# File 'lib/nvd_feed_api.rb', line 36

def scrap
  uri = URI(@url)
  html = Net::HTTP.get(uri)

  doc = Nokogiri::HTML(html)
  @feeds = []
  tmp_feeds = {}
  doc.css('#vuln-feed-table table.xml-feed-table tr[data-testid]').each do |tr|
    num, type = tr.attr('data-testid')[13..].split('-')
    case type
    when 'meta'
      tmp_feeds[num] = {}
      tmp_feeds[num][:name] = tr.css('td')[0].text
      tmp_feeds[num][:updated] = tr.css('td')[1].text
      tmp_feeds[num][:meta] = BASE + tr.css('td')[2].css('> a').attr('href').value
    when 'gz'
      tmp_feeds[num][:gz] = BASE + tr.css('td > a').attr('href').value
    when 'zip'
      tmp_feeds[num][:zip] = BASE + tr.css('td > a').attr('href').value
      @feeds.push(Feed.new(tmp_feeds[num][:name],
                           tmp_feeds[num][:updated],
                           tmp_feeds[num][:meta],
                           tmp_feeds[num][:gz],
                           tmp_feeds[num][:zip]))
    end
  end
  return @feeds.size
end

#update_feeds(feed) ⇒ Boolean #update_feeds(feed_arr) ⇒ Array<Boolean> #update_feeds(feed, *) ⇒ Array<Boolean>

Update the feeds

Examples:

s = NVDFeedScraper.new
s.scrap
f2015, f2017 = s.feeds("CVE-2015", "CVE-2017")
s.update_feeds(f2015, f2017) # => [false, false]

Overloads:

  • #update_feeds(feed) ⇒ Boolean

    One feed.

    Parameters:

    • feed (Feed)

      feed object to update.

    Returns:

    • (Boolean)

      true if the feed was updated, false if it wasn't.

  • #update_feeds(feed_arr) ⇒ Array<Boolean>

    An array of feed.

    Parameters:

    • feed_arr (Array<Feed>)

      array of feed objects to update.

    Returns:

    • (Array<Boolean>)

      true if the feed was updated, false if it wasn't.

  • #update_feeds(feed, *) ⇒ Array<Boolean>

    Multiple feeds.

    Parameters:

    • feed (Feed)

      feed object to update.

    • * (Feed)

      As many feed objects as you want.

    Returns:

    • (Array<Boolean>)

      true if the feed was updated, false if it wasn't.



260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
# File 'lib/nvd_feed_api.rb', line 260

def update_feeds(*arg_feed)
  return_value = false
  raise 'no argument provided, 1 or more expected' if arg_feed.empty?

  scrap
  if arg_feed.length == 1
    case arg_feed[0]
    when Feed
      new_feed = feeds(arg_feed[0].name)
      # update attributes
      return_value = arg_feed[0].update!(new_feed)
    when Array
      return_value = []
      arg_feed[0].each do |f|
        res = update_feeds(f)
        puts "#{f} not found" if res.nil?
        return_value.push(res)
      end
    else
      raise "the provided argument #{arg_feed[0]} is not a Feed or an Array"
    end
  else
    # Overloading a list of arguments as one array argument
    return_value = update_feeds(arg_feed)
  end
  return return_value
end