Skip to content

Commit bb98df4

Browse files
authored
Implement Selenium login (#66)
* Implement Selenium login * Fix headers and CSRF * Check for multiple browsers * Improve login flow * remove redundant statement * update docs * linting * add selenium
1 parent 3477cd7 commit bb98df4

File tree

3 files changed

+71
-132
lines changed

3 files changed

+71
-132
lines changed

README.md

Lines changed: 24 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@ OSINT Tool: Generate username lists from companies on LinkedIn.
33

44
This is a pure web-scraper, no API key required. You use your valid LinkedIn username and password to login, it will create several lists of possible username formats for all employees of a company you point it at.
55

6+
Login is done with Selenium in a spawned browser window. Maintaining a working CLI login flow was a lot of work, and this resolves many issues while supporting login challenges and 2FA.
7+
68
Use an account with a lot of connections, otherwise you'll get crappy results. Adding a couple connections at the target company should help - this tool will work up to third degree connections. Note that [LinkedIn will cap search results](https://www.linkedin.com/help/linkedin/answer/129/what-you-get-when-you-search-on-linkedin?lang=en) to 1000 employees max. You can use the features '--geoblast' or '--keywords' to bypass this limit. Look at help below for more details.
79

810
**WARNING**: LinkedIn has recently (Sept 2020) been hitting li2u users with the monthly commercial search limit. It's a bit mysterious as to when/why this happens. When you hit the limit, you won't be able to search again until the 1st of the month. If you know of a workaround, please let me know.
@@ -24,58 +26,50 @@ You'll need to provide the tool with LinkedIn's company name. You can find that
2426

2527
Here's an example to pull all employees of Uber:
2628
```
27-
$ python linkedin2username.py [email protected] uber-com
29+
$ python linkedin2username.py -c uber-com
2830
```
2931

3032
Here's an example to pull a shorter list and append the domain name @uber.com to them:
3133
```
32-
$ python linkedin2username.py [email protected] uber-com -d 5 -n 'uber.com'
34+
$ python linkedin2username.py -c uber-com -d 5 -n 'uber.com'
3335
```
3436

3537
# Full Help
3638
```
37-
usage: linkedin2username.py [-h] [-p PASSWORD] [-n DOMAIN] [-d DEPTH]
38-
[-s SLEEP]
39-
username company
39+
usage: linkedin2username.py [-h] -c COMPANY [-n DOMAIN] [-d DEPTH]
40+
[-s SLEEP] [-x PROXY] [-k KEYWORDS] [-g] [-o OUTPUT]
4041
41-
positional arguments:
42-
username A valid LinkedIn username.
43-
company Company name.
42+
OSINT tool to generate lists of probable usernames from a given company's LinkedIn page.
43+
This tool may break when LinkedIn changes their site.
44+
Please open issues on GitHub to report any inconsistencies.
4445
4546
optional arguments:
4647
-h, --help show this help message and exit
47-
-p PASSWORD, --password PASSWORD
48-
Specify your password on in clear-text on the command
49-
line. If not specified, will prompt and not display on
50-
screen.
48+
-c COMPANY, --company COMPANY
49+
Company name exactly as typed in the company linkedin profile page URL.
5150
-n DOMAIN, --domain DOMAIN
52-
Append a domain name to username output. [example: '-n
53-
uber.com' would ouput [email protected]]
51+
Append a domain name to username output. [example: "-n uber.com" would
52+
5453
-d DEPTH, --depth DEPTH
55-
Search depth. If unset, will try to grab them all.
54+
Search depth (how many loops of 25). If unset, will try to grab them
55+
all.
5656
-s SLEEP, --sleep SLEEP
57-
Seconds to sleep between pages. defaults to 3.
57+
Seconds to sleep between search loops. Defaults to 0.
5858
-x PROXY, --proxy PROXY
59-
HTTPS proxy server to use. Example: "-p
60-
https://localhost:8080" WARNING: WILL DISABLE SSL
61-
VERIFICATION.
62-
59+
Proxy server to use. WARNING: WILL DISABLE SSL VERIFICATION.
60+
[example: "-p https://localhost:8080"]
6361
-k KEYWORDS, --keywords KEYWORDS
64-
Filter results by a a list of command separated
65-
keywords. Will do a separate loop for each keyword,
66-
potentially bypassing the 1,000 record limit.
67-
[example: "-k 'sales,human resources,information
68-
technology']
69-
-g, --geoblast Attempts to bypass the 1,000 record search limit by
70-
running multiple searches split across geographic
71-
regions.
62+
Filter results by a a list of command separated keywords.
63+
Will do a separate loop for each keyword,
64+
potentially bypassing the 1,000 record limit.
65+
[example: "-k 'sales,human resources,information technology']
66+
-g, --geoblast Attempts to bypass the 1,000 record search limit by running
67+
multiple searches split across geographic regions.
7268
-o OUTPUT, --output OUTPUT
7369
Output Directory, defaults to li2u-output
7470
```
7571

7672
# Toubleshooting
7773
Sometimes LinkedIn does weird stuff or returns weird results. Sometimes it doesn't like you logging in from new locations. If something looks off, run the tool once or twice more. If it still isn't working, please open an issue.
7874

79-
Multi-factor authentication (MFA, 2FA) is not supported in this tool.
80-
8175
*This is a security research tool. Use only where granted explicit permission from the network owner.*

linkedin2username.py

Lines changed: 46 additions & 102 deletions
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,14 @@
1313
import re
1414
import time
1515
import argparse
16-
import getpass
1716
import json
1817
import urllib.parse
1918
import requests
2019
import urllib3
2120

21+
from selenium import webdriver
22+
from selenium.common.exceptions import WebDriverException
23+
2224
BANNER = r"""
2325
2426
.__ .__________
@@ -192,17 +194,10 @@ def parse_arguments():
192194
' to report any inconsistencies, and they will be quickly fixed.')
193195
parser = argparse.ArgumentParser(description=desc)
194196

195-
parser.add_argument('-u', '--username', type=str, action='store',
196-
required=True,
197-
help='A valid LinkedIn username.')
198197
parser.add_argument('-c', '--company', type=str, action='store',
199198
required=True,
200199
help='Company name exactly as typed in the company '
201200
'linkedin profile page URL.')
202-
parser.add_argument('-p', '--password', type=str, action='store',
203-
help='Specify your password in clear-text on the '
204-
'command line. If not specified, will prompt and '
205-
'obfuscate as you type.')
206201
parser.add_argument('-n', '--domain', type=str, action='store',
207202
default='',
208203
help='Append a domain name to username output. '
@@ -251,116 +246,59 @@ def parse_arguments():
251246
print("Sorry, keywords and geoblast are currently not compatible. Use one or the other.")
252247
sys.exit()
253248

254-
# If password is not passed in the command line, prompt for it
255-
# in a more secure fashion (not shown on screen)
256-
args.password = args.password or getpass.getpass()
257-
258249
return args
259250

260251

261-
def login(args):
262-
"""Creates a new authenticated session.
252+
def get_webdriver():
253+
"""
254+
Try to get a working Selenium browser driver
255+
"""
256+
for browser in [webdriver.Firefox, webdriver.Chrome]:
257+
try:
258+
return browser()
259+
except WebDriverException:
260+
continue
261+
return None
263262

264-
Note that a mobile user agent is used. Parsing using the desktop results
265-
proved extremely difficult, as shared connections would be returned in
266-
a manner that was indistinguishable from the desired targets.
267263

268-
The other header matters as well, otherwise advanced search functions
269-
(region and keyword) will not work.
264+
def login():
265+
"""Creates a new authenticated session.
270266
271-
The function will check for common failure scenarios - the most common is
272-
logging in from a new location. Accounts using multi-factor auth are not
273-
yet supported and will produce an error.
267+
This now uses Selenium because I got very tired playing cat/mouse
268+
with LinkedIn's login process.
274269
"""
275-
session = requests.session()
270+
driver = get_webdriver()
276271

277-
# The following are known errors that require the user to log in via the web
278-
login_problems = ['challenge', 'captcha', 'manage-account', 'add-email']
272+
if driver is None:
273+
print("[!] Could not find a supported browser for Selenium. Exiting.")
274+
sys.exit(1)
279275

280-
# Special options below when using a proxy server. Helpful for debugging
281-
# the application in Burp Suite.
282-
if args.proxy:
283-
print("[!] Using a proxy, ignoring SSL errors. Don't get pwned.")
284-
session.verify = False
285-
urllib3.disable_warnings(category=urllib3.exceptions.InsecureRequestWarning)
286-
session.proxies.update(args.proxy_dict)
276+
driver.get("https://linkedin.com/login")
277+
278+
# Pause until the user lets us know the session is good.
279+
print("[*] Log in to LinkedIn. Leave the browser open and press enter when ready...")
280+
input("Ready? Press Enter!")
281+
282+
selenium_cookies = driver.get_cookies()
283+
driver.quit()
284+
285+
# Initialize and return a requests session
286+
session = requests.Session()
287+
for cookie in selenium_cookies:
288+
session.cookies.set(cookie['name'], cookie['value'])
287289

288-
# Our search and regex will work only with a mobile user agent and
289-
# the correct REST protocol specified below.
290+
# Add headers required for this tool to function
290291
mobile_agent = ('Mozilla/5.0 (Linux; U; Android 4.4.2; en-us; SCH-I535 '
291292
'Build/KOT49H) AppleWebKit/534.30 (KHTML, like Gecko) '
292293
'Version/4.0 Mobile Safari/534.30')
293294
session.headers.update({'User-Agent': mobile_agent,
294295
'X-RestLi-Protocol-Version': '2.0.0',
295296
'X-Li-Track': '{"clientVersion":"1.13.1665"}'})
296297

297-
# We wll grab an anonymous response to look for the CSRF token, which
298-
# is required for our logon attempt.
299-
anon_response = session.get('https://www.linkedin.com/login')
300-
login_csrf = re.findall(r'name="loginCsrfParam" value="(.*?)"',
301-
anon_response.text)
302-
if login_csrf:
303-
login_csrf = login_csrf[0]
304-
else:
305-
print("Having trouble loading login page... try the command again.")
306-
sys.exit()
307-
308-
# Define the data we will POST for our login.
309-
auth_payload = {
310-
'session_key': args.username,
311-
'session_password': args.password,
312-
'isJsEnabled': 'false',
313-
'loginCsrfParam': login_csrf
314-
}
315-
316-
# Perform the actual login. We disable redirects as we will use that 302
317-
# as an indicator of a successful logon.
318-
response = session.post('https://www.linkedin.com/checkpoint/lg/login-submit'
319-
'?loginSubmitSource=GUEST_HOME',
320-
data=auth_payload, allow_redirects=False)
321-
322-
# Define a successful login by the 302 redirect to the 'feed' page. Try
323-
# to detect some other common logon failures and alert the user.
324-
if response.status_code in (302, 303):
325-
# Add CSRF token for all additional requests
326-
session = set_csrf_token(session)
327-
redirect = response.headers['Location']
328-
if 'feed' in redirect:
329-
return session
330-
if 'add-phone' in redirect:
331-
# Skip the prompt to add a phone number
332-
url = 'https://www.linkedin.com/checkpoint/post-login/security/dismiss-phone-event'
333-
response = session.post(url)
334-
if response.status_code == 200:
335-
return session
336-
print("[!] Could not skip phone prompt. Log in via the web and then try again.\n")
337-
338-
elif any(x in redirect for x in login_problems):
339-
print("[!] LinkedIn has a message for you that you need to address. "
340-
"Please log in using a web browser first, and then come back and try again.")
341-
else:
342-
# The below will detect some 302 that I don't yet know about.
343-
print("[!] Some unknown redirection occurred. If this persists, please open an issue "
344-
"and include the info below:")
345-
print("DEBUG INFO:")
346-
print(f"LOCATION: {redirect}")
347-
print(f"RESPONSE TEXT:\n{response.text}")
348-
349-
return False
350-
351-
# A failed logon doesn't generate a 302 at all, but simply responds with
352-
# the logon page. We detect this here.
353-
if '<title>LinkedIn Login' in response.text:
354-
print("[!] Check your username and password and try again.\n")
355-
return False
298+
# Set the CSRF token
299+
session = set_csrf_token(session)
356300

357-
# If we make it past everything above, we have no idea what happened.
358-
# Oh well, we fail.
359-
print("[!] Some unknown error logging in. If this persists, please open an issue on github.\n")
360-
print("DEBUG INFO:")
361-
print(f"RESPONSE CODE: {response.status_code}")
362-
print(f"RESPONSE TEXT:\n{response.text}")
363-
return False
301+
return session
364302

365303

366304
def set_csrf_token(session):
@@ -717,14 +655,20 @@ def main():
717655
args = parse_arguments()
718656

719657
# Instantiate a session by logging in to LinkedIn.
720-
session = login(args)
658+
session = login()
721659

722660
# If we can't get a valid session, we quit now. Specific errors are
723661
# printed to the console inside the login() function.
724662
if not session:
725663
sys.exit()
726664

727-
print("[*] Successfully logged in.")
665+
# Special options below when using a proxy server. Helpful for debugging
666+
# the application in Burp Suite.
667+
if args.proxy:
668+
print("[!] Using a proxy, ignoring SSL errors. Don't get pwned.")
669+
session.verify = False
670+
urllib3.disable_warnings(category=urllib3.exceptions.InsecureRequestWarning)
671+
session.proxies.update(args.proxy_dict)
728672

729673
# Get basic company info
730674
print("[*] Trying to get company info...")

requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
11
requests
2+
selenium

0 commit comments

Comments
 (0)