Python3 使用hdfs分布式文件儲存系統
from?pyhdfs?import?*
client?=?HdfsClient(hosts="testhdfs.org,?50070",
user_name="web_crawler")????#????創建一個連接
client.get_home_directory()????#?獲取hdfs根路徑
client.listdir(PATH)????#?獲取hdfs指定路徑下的文件列表
client.copy_from_local(file_path,?hdfs_path,?overwrite=True)????#?把本地文件拷貝到服務器,不支持文件夾;overwrite=True表示存在則覆蓋
?client.delete(PATH,?recursive=True)????#?刪除指定文件
hdfs_path必須包含文件名及其后綴,不然不會成功
如果連接
HdfsClient
報錯
Traceback (most recent call last):
? File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2963, in run_code
? ? exec(code_obj, self.user_global_ns, self.user_ns)
? File "
? ? client.get_home_directory()
? File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 565, in get_home_directory
? ? return _json(self._get('/', 'GETHOMEDIRECTORY', **kwargs))['Path']
? File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 391, in _get
? ? return self._request('get', *args, **kwargs)
? File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 377, in _request
? ? _check_response(response, expected_status)
? File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 799, in _check_response
? ? remote_exception = _json(response)['RemoteException']
? File "C:\Users\billl\AppData\Local\Continuum\anaconda3\lib\site-packages\pyhdfs.py", line 793, in _json
? ? "Expected JSON. Is WebHDFS enabled? Got {!r}".format(response.text))
pyhdfs.HdfsException: Expected JSON. Is WebHDFS enabled? Got '\n\n\n\n
502 Server dropped connection
\n
The following error occurred while trying to accesshttp://%2050070:50070/webhdfs/v1/?user.name=web_crawler&op=GETHOMEDIRECTORY:
\n502 Server dropped connection
\n
Generated Fri, 21 Dec 2018 02:03:18 GMT by Polipo on .\n\r\n'
則一般是訪問認證錯誤,可能原因是賬戶密碼不正確或者無權限,或者本地網絡不在可訪問名單中