{"id":419,"date":"2018-03-09T16:23:40","date_gmt":"2018-03-09T16:23:40","guid":{"rendered":"http:\/\/www.igfasouza.com\/blog\/?p=419"},"modified":"2021-04-27T10:49:59","modified_gmt":"2021-04-27T09:49:59","slug":"jupyter-notebook","status":"publish","type":"post","link":"http:\/\/www.igfasouza.com\/blog\/jupyter-notebook\/","title":{"rendered":"Jupyter Notebook"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/03\/jupyter.png\" alt=\"\" width=\"883\" height=\"1023\" class=\"alignnone size-full wp-image-829\" srcset=\"http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/03\/jupyter.png 883w, http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/03\/jupyter-259x300.png 259w, http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/03\/jupyter-768x890.png 768w, http:\/\/www.igfasouza.com\/blog\/wp-content\/uploads\/2018\/03\/jupyter-624x723.png 624w\" sizes=\"auto, (max-width: 883px) 100vw, 883px\" \/><\/p>\n<p><b>How\u2019s it going there?<\/b><\/p>\n<p><a href=\"http:\/\/jupyter.org\/\" rel=\"noopener\" target=\"_blank\">Jupyter Notebook<\/a> is a popular application that enables you to edit, run and share Python code into a web view. It allows you to modify and re-execute parts of your code in a very flexible way. That\u2019s why Jupyter is a great tool to test and prototype programs.<\/p>\n<p><a href=\"http:\/\/spark.apache.org\/\" rel=\"noopener\" target=\"_blank\">Apache Spark<\/a> is a fast and powerful framework that provides an API to perform massively distributed processing over resilient sets of data.<\/p>\n<p>Get Started with Spark and Jupyter together.<\/p>\n<p>Install Spark<\/p>\n<p>visit the <a href=\"http:\/\/spark.apache.org\/downloads.html\" rel=\"noopener\" target=\"_blank\">Spark downloads page<\/a>. Select the latest Spark release, a prebuilt package for Hadoop, and download it directly.<br \/>\nUnzip it and move it to your \/opt folder:<\/p>\n<div class=\"codecolorer-container bash blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/>2<br \/><\/div><\/td><td><div class=\"bash codecolorer\">$ <span class=\"kw2\">tar<\/span> <span class=\"re5\">-xzf<\/span> spark-1.2.0-bin-hadoop2.4.tgz<br \/>\n$ <span class=\"kw2\">mv<\/span> spark-1.2.0-bin-hadoop2.4 <span class=\"sy0\">\/<\/span>opt<span class=\"sy0\">\/<\/span>spark-1.2.0<\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>Create a symbolic link:<\/p>\n<div class=\"codecolorer-container bash blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/><\/div><\/td><td><div class=\"bash codecolorer\"><span class=\"co4\">$ <\/span><span class=\"kw2\">ln<\/span> <span class=\"re5\">-s<\/span> <span class=\"sy0\">\/<\/span>opt<span class=\"sy0\">\/<\/span>spark-1.2.0 <span class=\"sy0\">\/<\/span>opt<span class=\"sy0\">\/<\/span>spark<\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>This way, you will be able to download and use multiple Spark versions.<\/p>\n<p>Finally, tell your bash (or zsh, etc.) where to find Spark. To do so, configure your $PATH variables by adding the following lines in your ~\/.bashrc (or~\/.zshrc) file:<\/p>\n<div class=\"codecolorer-container bash blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/>2<br \/><\/div><\/td><td><div class=\"bash codecolorer\">$ <span class=\"kw3\">export<\/span> <span class=\"re2\">SPARK_HOME<\/span>=<span class=\"sy0\">\/<\/span>opt<span class=\"sy0\">\/<\/span>spark<br \/>\n$ <span class=\"kw3\">export<\/span> <span class=\"re2\">PATH<\/span>=<span class=\"re1\">$SPARK_HOME<\/span><span class=\"sy0\">\/<\/span>bin:<span class=\"re1\">$PATH<\/span><\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>Install Jupyter<\/p>\n<div class=\"codecolorer-container bash blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/><\/div><\/td><td><div class=\"bash codecolorer\"><span class=\"co4\">$ <\/span>pip <span class=\"kw2\">install<\/span> jupyter<\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>You can run a regular jupyter notebook by typing:<\/p>\n<div class=\"codecolorer-container bash blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/><\/div><\/td><td><div class=\"bash codecolorer\"><span class=\"co4\">$ <\/span>jupyter notebook<\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>There are two ways to get PySpark available in a Jupyter Notebook:<\/p>\n<p>1 &#8211; Configure PySpark driver to use Jupyter Notebook: running pyspark will automatically open a Jupyter Notebook<br \/>\n2 &#8211; Load a regular Jupyter Notebook and load PySpark using findSpark package<\/p>\n<p><strong>Option 1:<\/strong><\/p>\n<p>Update PySpark driver environment variables: add these lines to your~\/.bashrc (or ~\/.zshrc) file.<\/p>\n<div class=\"codecolorer-container bash blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/>2<br \/><\/div><\/td><td><div class=\"bash codecolorer\">$ <span class=\"kw3\">export<\/span> <span class=\"re2\">PYSPARK_DRIVER_PYTHON<\/span>=jupyter<br \/>\n$ <span class=\"kw3\">export<\/span> <span class=\"re2\">PYSPARK_DRIVER_PYTHON_OPTS<\/span>=<span class=\"st_h\">'notebook'<\/span><\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>Restart your terminal and launch PySpark again:<\/p>\n<div class=\"codecolorer-container bash blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/><\/div><\/td><td><div class=\"bash codecolorer\"><span class=\"co4\">$ <\/span>pyspark<\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>Now, this command should start a Jupyter Notebook in your web browser. Create a new notebook by clicking on \u2018New\u2019 > \u2018Notebooks Python [default]\u2019.<\/p>\n<p><strong>Option 2:<\/strong><\/p>\n<p>Use findSpark package to make a Spark Context available in your code.<\/p>\n<p>findSpark package is not specific to Jupyter Notebook, you can use this trick in your favorite IDE too.<br \/>\nTo install findspark:<\/p>\n<div class=\"codecolorer-container bash blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/><\/div><\/td><td><div class=\"bash codecolorer\"><span class=\"co4\">$ <\/span>pip <span class=\"kw2\">install<\/span> findspark<\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>Launch a regular Jupyter Notebook:<\/p>\n<div class=\"codecolorer-container bash blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/><\/div><\/td><td><div class=\"bash codecolorer\"><span class=\"co4\">$ <\/span>jupyter notebook<\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>In your python code you need to add:<\/p>\n<div class=\"codecolorer-container python blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/>2<br \/><\/div><\/td><td><div class=\"python codecolorer\"><span class=\"kw1\">import<\/span> findspark<br \/>\nfindspark.<span class=\"me1\">init<\/span><span class=\"br0\">&#40;<\/span>\u201c\/path_to_spark\u201d<span class=\"br0\">&#41;<\/span><\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>Now you can try out and see. I hope this guide will help you easily get started with Jupyter and Spark<\/p>\n<p>Here is a python code example to test:<\/p>\n<div class=\"codecolorer-container python blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/>2<br \/>3<br \/>4<br \/>5<br \/>6<br \/>7<br \/>8<br \/>9<br \/>10<br \/>11<br \/>12<br \/>13<br \/><\/div><\/td><td><div class=\"python codecolorer\"><span class=\"kw1\">import<\/span> findspark<br \/>\nfindspark.<span class=\"me1\">init<\/span><span class=\"br0\">&#40;<\/span>\u201c\/opt\/spark-1.4.1-bin-hadoop2.6\/\u201d<span class=\"br0\">&#41;<\/span><br \/>\n<span class=\"kw1\">import<\/span> pyspark<br \/>\n<span class=\"kw1\">import<\/span> <span class=\"kw3\">random<\/span><br \/>\nsc <span class=\"sy0\">=<\/span> pyspark.<span class=\"me1\">SparkContext<\/span><span class=\"br0\">&#40;<\/span>appName<span class=\"sy0\">=<\/span><span class=\"st0\">&quot;Pi&quot;<\/span><span class=\"br0\">&#41;<\/span><br \/>\nnum_samples <span class=\"sy0\">=<\/span> <span class=\"nu0\">100000000<\/span><br \/>\n<span class=\"kw1\">def<\/span> inside<span class=\"br0\">&#40;<\/span>p<span class=\"br0\">&#41;<\/span>: &nbsp; &nbsp;<br \/>\n&nbsp;x<span class=\"sy0\">,<\/span> y <span class=\"sy0\">=<\/span> <span class=\"kw3\">random<\/span>.<span class=\"kw3\">random<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span><span class=\"sy0\">,<\/span> <span class=\"kw3\">random<\/span>.<span class=\"kw3\">random<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span><br \/>\n&nbsp;<span class=\"kw1\">return<\/span> x*x + y*y <span class=\"sy0\">&lt;<\/span> <span class=\"nu0\">1<\/span><br \/>\ncount <span class=\"sy0\">=<\/span> sc.<span class=\"me1\">parallelize<\/span><span class=\"br0\">&#40;<\/span><span class=\"kw2\">range<\/span><span class=\"br0\">&#40;<\/span><span class=\"nu0\">0<\/span><span class=\"sy0\">,<\/span> num_samples<span class=\"br0\">&#41;<\/span><span class=\"br0\">&#41;<\/span>.<span class=\"kw2\">filter<\/span><span class=\"br0\">&#40;<\/span>inside<span class=\"br0\">&#41;<\/span>.<span class=\"me1\">count<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span><br \/>\npi <span class=\"sy0\">=<\/span> <span class=\"nu0\">4<\/span> * count \/ num_samples<br \/>\n<span class=\"kw1\">print<\/span><span class=\"br0\">&#40;<\/span>pi<span class=\"br0\">&#41;<\/span><br \/>\nsc.<span class=\"me1\">stop<\/span><span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span><\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p><a href=\"https:\/\/toree.incubator.apache.org\/\" rel=\"noopener\" target=\"_blank\">Apache Toree<\/a> is a kernel for the Jupyter Notebook platform providing interactive access to Apache Spark.<\/p>\n<p>Install Toree.<\/p>\n<div class=\"codecolorer-container bash blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/><\/div><\/td><td><div class=\"bash codecolorer\"><span class=\"co4\">$ <\/span><span class=\"kw2\">sudo<\/span> pip <span class=\"kw2\">install<\/span> toree<\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>Configure<\/p>\n<p>Set SPARK_HOME to point to the directory where you downloaded and expanded the Spark binaries.<\/p>\n<div class=\"codecolorer-container bash blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/>2<br \/>3<br \/>4<br \/><\/div><\/td><td><div class=\"bash codecolorer\">$ <span class=\"re2\">SPARK_HOME<\/span>=<span class=\"re1\">$HOME<\/span><span class=\"sy0\">\/<\/span>Downloads<span class=\"sy0\">\/<\/span>spark-x.x.x-bin-hadoopx.x<br \/>\n<br \/>\n$ jupyter toree <span class=\"kw2\">install<\/span> \\<br \/>\n&nbsp; --spark_home=<span class=\"re1\">$SPARK_HOME<\/span><\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>Start notebook.<\/p>\n<div class=\"codecolorer-container bash blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/><\/div><\/td><td><div class=\"bash codecolorer\"><span class=\"co4\">$ <\/span>jupyter notebook<\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>Test<\/p>\n<p>Point browser to http:\/\/localhost:8888.<br \/>\nThen open a new notebook using New > Toree.<\/p>\n<p>Test notebook with simple Spark Scala code.<\/p>\n<div class=\"codecolorer-container scala blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/>2<br \/>3<br \/>4<br \/><\/div><\/td><td><div class=\"scala codecolorer\">sc.<span class=\"me1\">parallelize<\/span><span class=\"br0\">&#40;<\/span><span class=\"nu0\">1<\/span> to <span class=\"nu0\">100<\/span><span class=\"br0\">&#41;<\/span>.<br \/>\n&nbsp; <span class=\"me1\">filter<\/span><span class=\"br0\">&#40;<\/span>x <span class=\"sy0\">=&gt;<\/span> x <span class=\"sy0\">%<\/span> <span class=\"nu0\">2<\/span> <span class=\"sy0\">==<\/span> <span class=\"nu0\">0<\/span><span class=\"br0\">&#41;<\/span>.<br \/>\n&nbsp; <span class=\"me1\">map<\/span><span class=\"br0\">&#40;<\/span>x <span class=\"sy0\">=&gt;<\/span> x <span class=\"sy0\">*<\/span> x<span class=\"br0\">&#41;<\/span>.<br \/>\n&nbsp; <span class=\"me1\">take<\/span><span class=\"br0\">&#40;<\/span><span class=\"nu0\">10<\/span><span class=\"br0\">&#41;<\/span><\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>Here you can use tab for auto-complete.<\/p>\n<p>To run Jupyter with R<br \/>\nInstall IRkernel<\/p>\n<div class=\"codecolorer-container bash blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/><\/div><\/td><td><div class=\"bash codecolorer\"><span class=\"co4\">$ <\/span>conda <span class=\"kw2\">install<\/span> <span class=\"re5\">-c<\/span> r ipython-notebook r-irkernel<\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>You can now open R and Install some necessary packages used by R kernel on ipython notebook<\/p>\n<div class=\"codecolorer-container python blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/><\/div><\/td><td><div class=\"python codecolorer\">install.<span class=\"me1\">packages<\/span><span class=\"br0\">&#40;<\/span>c<span class=\"br0\">&#40;<\/span><span class=\"st0\">'rzmq'<\/span><span class=\"sy0\">,<\/span><span class=\"st0\">'repr'<\/span><span class=\"sy0\">,<\/span><span class=\"st0\">'IRkernel'<\/span><span class=\"sy0\">,<\/span><span class=\"st0\">'IRdisplay'<\/span><span class=\"br0\">&#41;<\/span><span class=\"sy0\">,<\/span> repos <span class=\"sy0\">=<\/span> <span class=\"st0\">'http:\/\/irkernel.github.io\/'<\/span><span class=\"sy0\">,<\/span> <span class=\"kw2\">type<\/span> <span class=\"sy0\">=<\/span> <span class=\"st0\">'source'<\/span><span class=\"br0\">&#41;<\/span><\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>After the packages are successfully downloaded and installed.<br \/>\nType this and quit<\/p>\n<div class=\"codecolorer-container python blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/>2<br \/>3<br \/><\/div><\/td><td><div class=\"python codecolorer\">IRkernel::installspec<span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span><br \/>\n<br \/>\nquit<span class=\"br0\">&#40;<\/span><span class=\"br0\">&#41;<\/span><\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>Start the notebook and check new -> R<\/p>\n<p>You can install Jupyter on Raspberry Pi<\/p>\n<div class=\"codecolorer-container bash blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/>2<br \/>3<br \/>4<br \/>5<br \/><\/div><\/td><td><div class=\"bash codecolorer\">$ <span class=\"kw2\">apt-get install<\/span> python3-matplotlib<br \/>\n$ <span class=\"kw2\">apt-get install<\/span> python3-scipy<br \/>\n$ pip3 <span class=\"kw2\">install<\/span> <span class=\"re5\">--upgrade<\/span> pip<br \/>\n$ reboot<br \/>\n$ <span class=\"kw2\">sudo<\/span> pip3 <span class=\"kw2\">install<\/span> jupyter<\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p>To start<\/p>\n<div class=\"codecolorer-container bash blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/><\/div><\/td><td><div class=\"bash codecolorer\"><span class=\"co4\">$ <\/span>jupyter-notebook<\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>Simple Python example:<\/p>\n<div class=\"codecolorer-container python blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/>2<br \/>3<br \/>4<br \/>5<br \/>6<br \/><\/div><\/td><td><div class=\"python codecolorer\"><span class=\"kw1\">import<\/span> pyspark<br \/>\nsc <span class=\"sy0\">=<\/span> pyspark.<span class=\"me1\">SparkContext<\/span><span class=\"br0\">&#40;<\/span><span class=\"st0\">'local[*]'<\/span><span class=\"br0\">&#41;<\/span><br \/>\n<br \/>\n<span class=\"co1\"># do something to prove it works<\/span><br \/>\nrdd <span class=\"sy0\">=<\/span> sc.<span class=\"me1\">parallelize<\/span><span class=\"br0\">&#40;<\/span><span class=\"kw2\">range<\/span><span class=\"br0\">&#40;<\/span><span class=\"nu0\">1000<\/span><span class=\"br0\">&#41;<\/span><span class=\"br0\">&#41;<\/span><br \/>\nrdd.<span class=\"me1\">takeSample<\/span><span class=\"br0\">&#40;<\/span><span class=\"kw2\">False<\/span><span class=\"sy0\">,<\/span> <span class=\"nu0\">5<\/span><span class=\"br0\">&#41;<\/span><\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>Simple R example:<\/p>\n<div class=\"codecolorer-container python blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/>2<br \/>3<br \/>4<br \/>5<br \/>6<br \/>7<br \/><\/div><\/td><td><div class=\"python codecolorer\">library<span class=\"br0\">&#40;<\/span>SparkR<span class=\"br0\">&#41;<\/span><br \/>\n<br \/>\n<span class=\"kw1\">as<\/span> <span class=\"sy0\">&lt;<\/span>- sparkR.<span class=\"me1\">session<\/span><span class=\"br0\">&#40;<\/span><span class=\"st0\">&quot;local[*]&quot;<\/span><span class=\"br0\">&#41;<\/span><br \/>\n<br \/>\n<span class=\"co1\"># do something to prove it works<\/span><br \/>\ndf <span class=\"sy0\">&lt;<\/span>- <span class=\"kw1\">as<\/span>.<span class=\"me1\">DataFrame<\/span><span class=\"br0\">&#40;<\/span>iris<span class=\"br0\">&#41;<\/span><br \/>\nhead<span class=\"br0\">&#40;<\/span><span class=\"kw2\">filter<\/span><span class=\"br0\">&#40;<\/span>df<span class=\"sy0\">,<\/span> df$Petal_Width <span class=\"sy0\">&gt;<\/span> <span class=\"nu0\">0.2<\/span><span class=\"br0\">&#41;<\/span><span class=\"br0\">&#41;<\/span><\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>Simple Scala example:<\/p>\n<div class=\"codecolorer-container scala blackboard\" style=\"overflow:auto;white-space:nowrap;\"><table cellspacing=\"0\" cellpadding=\"0\"><tbody><tr><td class=\"line-numbers\"><div>1<br \/>2<br \/><\/div><\/td><td><div class=\"scala codecolorer\"><a href=\"http:\/\/scala-lang.org\"><span class=\"kw1\">val<\/span><\/a> rdd <span class=\"sy0\">=<\/span> sc.<span class=\"me1\">parallelize<\/span><span class=\"br0\">&#40;<\/span><span class=\"nu0\">0<\/span> to <span class=\"nu0\">999<\/span><span class=\"br0\">&#41;<\/span><br \/>\nrdd.<span class=\"me1\">takeSample<\/span><span class=\"br0\">&#40;<\/span><a href=\"http:\/\/scala-lang.org\"><span class=\"kw1\">false<\/span><\/a>, <span class=\"nu0\">5<\/span><span class=\"br0\">&#41;<\/span><\/div><\/td><\/tr><\/tbody><\/table><\/div>\n<p><\/p>\n<p>Use the pre-configured SparkContext in variable sc.<\/p>\n<p><strong>Links<\/strong><\/p>\n<p>Apache Torre<br \/>\n<a href=\"https:\/\/github.com\/asimjalis\/apache-toree-quickstart\" rel=\"noopener\" target=\"_blank\">https:\/\/github.com\/asimjalis\/apache-toree-quickstart<\/a><\/p>\n<p>R on Jupyter<br \/>\n<a href=\"https:\/\/discuss.analyticsvidhya.com\/t\/how-to-run-r-on-jupyter-ipython-notebooks\/5512\/2\" rel=\"noopener\" target=\"_blank\">https:\/\/discuss.analyticsvidhya.com\/t\/how-to-run-r-on-jupyter-ipython-notebooks\/5512\/2<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>How\u2019s it going there? Jupyter Notebook is a popular application that enables you to edit, run and share Python code into a web view. It allows you to modify and re-execute parts of your code in a very flexible way.&hellip; <a href=\"http:\/\/www.igfasouza.com\/blog\/jupyter-notebook\/\" class=\"more-link\">Continue Reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":829,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-419","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/posts\/419","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/comments?post=419"}],"version-history":[{"count":14,"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/posts\/419\/revisions"}],"predecessor-version":[{"id":1193,"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/posts\/419\/revisions\/1193"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/media\/829"}],"wp:attachment":[{"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/media?parent=419"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/categories?post=419"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.igfasouza.com\/blog\/wp-json\/wp\/v2\/tags?post=419"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}