{"id":10618,"date":"2021-03-19T02:00:49","date_gmt":"2021-03-19T09:00:49","guid":{"rendered":"http:\/\/softwareengineeringdaily.com\/?p=10618"},"modified":"2021-03-19T14:17:00","modified_gmt":"2021-03-19T21:17:00","slug":"datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan","status":"publish","type":"post","link":"https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/","title":{"rendered":"Datahub: Open Source Data Lake with Pardhu Gunnam and Mars Lan"},"content":{"rendered":"<p><img data-attachment-id=\"2475\" data-permalink=\"https:\/\/softwareengineeringdaily.com\/2016\/04\/19\/googles-container-management-brendan-burns\/brendan-burns\/\" data-orig-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2016\/04\/brendan-burns.jpg?fit=175%2C175&amp;ssl=1\" data-orig-size=\"175,175\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"brendan-burns\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2016\/04\/brendan-burns.jpg?fit=175%2C175&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2016\/04\/brendan-burns.jpg?fit=175%2C175&amp;ssl=1\" decoding=\"async\" loading=\"lazy\" class=\"alignright size-full wp-image-2475\" style=\"border-radius: 50%; border: 1px solid #000000; max-width: 175px; max-height: 175px;\" src=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2021\/03\/datahublogo.png?resize=175%2C175&#038;ssl=1\" width=\"175\" height=\"175\" data-recalc-dims=\"1\" \/><\/p>\n<p><span style=\"font-weight: 400;\">As the volume and scope of data collected by an organization grow, tasks such as data discovery and data management grow in complexity. Simply put, the more data there is, the harder it is for users such as data analysts to find what they\u2019re looking for. A metadata hub helps manage Big Data by providing metadata search and discovery tools, and a centralized hub which presents a holistic view of the data ecosystem. DataHub is Linkedin\u2019s open-sourced metadata search and discovery tool. It is Linkedin\u2019s second generation of metadata hubs after WhereHows.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Pardhu Gunnam and Mars Lan join us today from Metaphor, a company they co-founded to build out the DataHub ecosystem. Pardhu and Mars, and the other co-founders of Metaphor, were part of the team at Linkedin that built the DataHub project. They join the show today to talk about how DataHub democratizes data access for an organization, why the new DataHub architecture was critical to Linkedin\u2019s growth, and what we can expect to see from the DataHub project moving forwards.<\/span><\/p>\n<p>Sponsorship inquiries:\u00a0<a href=\"mailto:sponsor@softwareengineeringdaily.com\" target=\"_blank\" rel=\"noopener noreferrer\">sponsor@softwareengineeringdaily.com<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>As the volume and scope of data collected by an organization grow, tasks such as data discovery and data management grow in complexity. Simply put, the more data there is, the harder it is for users such as data analysts to find what they\u2019re looking for. A metadata hub helps manage Big Data by providing<\/p>\n","protected":false},"author":3,"featured_media":10682,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_newsletter_tier_id":0,"footnotes":"","jetpack_publicize_message":"Datahub: Open Source Data Lake with Pardhu Gunnam and Mars Lan @datahubio @MetaphorData @mars_lan @PardhuGunnam","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[1363,2143,1078,14],"tags":[4507,4504,336,4503,4506,4505,4502,4508],"jetpack_publicize_connections":[],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v21.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Datahub: Open Source Data Lake with Pardhu Gunnam and Mars Lan - Software Engineering Daily<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Datahub: Open Source Data Lake with Pardhu Gunnam and Mars Lan - Software Engineering Daily\" \/>\n<meta property=\"og:description\" content=\"As the volume and scope of data collected by an organization grow, tasks such as data discovery and data management grow in complexity. Simply put, the more data there is, the harder it is for users such as data analysts to find what they\u2019re looking for. A metadata hub helps manage Big Data by providing\" \/>\n<meta property=\"og:url\" content=\"https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/\" \/>\n<meta property=\"og:site_name\" content=\"Software Engineering Daily\" \/>\n<meta property=\"article:published_time\" content=\"2021-03-19T09:00:49+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-03-19T21:17:00+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2021\/03\/DataHubhero.png?fit=1300%2C452\" \/>\n\t<meta property=\"og:image:width\" content=\"1300\" \/>\n\t<meta property=\"og:image:height\" content=\"452\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"SE Daily\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@software_daily\" \/>\n<meta name=\"twitter:site\" content=\"@software_daily\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"SE Daily\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/\"},\"author\":{\"name\":\"SE Daily\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/822f06fe7d6f895baba29a9c0a3aa6c8\"},\"headline\":\"Datahub: Open Source Data Lake with Pardhu Gunnam and Mars Lan\",\"datePublished\":\"2021-03-19T09:00:49+00:00\",\"dateModified\":\"2021-03-19T21:17:00+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/\"},\"wordCount\":192,\"publisher\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/#organization\"},\"keywords\":[\"data search\",\"DataHub\",\"LinkedIn\",\"Mars Lan\",\"metadata hub\",\"Metaphor\",\"Pardhu Gunnam\",\"WhereHows\"],\"articleSection\":[\"All Content\",\"Exclusive Content\",\"Open Source\",\"Podcast\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/\",\"url\":\"https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/\",\"name\":\"Datahub: Open Source Data Lake with Pardhu Gunnam and Mars Lan - Software Engineering Daily\",\"isPartOf\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/#website\"},\"datePublished\":\"2021-03-19T09:00:49+00:00\",\"dateModified\":\"2021-03-19T21:17:00+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/softwareengineeringdaily.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Datahub: Open Source Data Lake with Pardhu Gunnam and Mars Lan\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#website\",\"url\":\"https:\/\/softwareengineeringdaily.com\/\",\"name\":\"Software Engineering Daily\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/softwareengineeringdaily.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#organization\",\"name\":\"Software Engineering Daily\",\"url\":\"https:\/\/softwareengineeringdaily.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2022\/01\/cropped-logo-new.png?fit=296%2C139&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2022\/01\/cropped-logo-new.png?fit=296%2C139&ssl=1\",\"width\":296,\"height\":139,\"caption\":\"Software Engineering Daily\"},\"image\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/twitter.com\/software_daily\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/822f06fe7d6f895baba29a9c0a3aa6c8\",\"name\":\"SE Daily\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/b92f4cf3dc4d94f73834f83e2a22a372?s=96&d=retro&r=pg\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/b92f4cf3dc4d94f73834f83e2a22a372?s=96&d=retro&r=pg\",\"caption\":\"SE Daily\"},\"description\":\"The SE Daily podcast.\",\"sameAs\":[\"https:\/\/softwareengineeringdaily.com\"],\"url\":\"https:\/\/softwareengineeringdaily.com\/author\/erikawho\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Datahub: Open Source Data Lake with Pardhu Gunnam and Mars Lan - Software Engineering Daily","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/","og_locale":"en_US","og_type":"article","og_title":"Datahub: Open Source Data Lake with Pardhu Gunnam and Mars Lan - Software Engineering Daily","og_description":"As the volume and scope of data collected by an organization grow, tasks such as data discovery and data management grow in complexity. Simply put, the more data there is, the harder it is for users such as data analysts to find what they\u2019re looking for. A metadata hub helps manage Big Data by providing","og_url":"https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/","og_site_name":"Software Engineering Daily","article_published_time":"2021-03-19T09:00:49+00:00","article_modified_time":"2021-03-19T21:17:00+00:00","og_image":[{"width":1300,"height":452,"url":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2021\/03\/DataHubhero.png?fit=1300%2C452","type":"image\/png"}],"author":"SE Daily","twitter_card":"summary_large_image","twitter_creator":"@software_daily","twitter_site":"@software_daily","twitter_misc":{"Written by":"SE Daily","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/#article","isPartOf":{"@id":"https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/"},"author":{"name":"SE Daily","@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/822f06fe7d6f895baba29a9c0a3aa6c8"},"headline":"Datahub: Open Source Data Lake with Pardhu Gunnam and Mars Lan","datePublished":"2021-03-19T09:00:49+00:00","dateModified":"2021-03-19T21:17:00+00:00","mainEntityOfPage":{"@id":"https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/"},"wordCount":192,"publisher":{"@id":"https:\/\/softwareengineeringdaily.com\/#organization"},"keywords":["data search","DataHub","LinkedIn","Mars Lan","metadata hub","Metaphor","Pardhu Gunnam","WhereHows"],"articleSection":["All Content","Exclusive Content","Open Source","Podcast"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/","url":"https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/","name":"Datahub: Open Source Data Lake with Pardhu Gunnam and Mars Lan - Software Engineering Daily","isPartOf":{"@id":"https:\/\/softwareengineeringdaily.com\/#website"},"datePublished":"2021-03-19T09:00:49+00:00","dateModified":"2021-03-19T21:17:00+00:00","breadcrumb":{"@id":"https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/softwareengineeringdaily.com\/2021\/03\/19\/datahub-open-source-data-lake-with-pardhu-gunnam-and-mars-lan\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/softwareengineeringdaily.com\/"},{"@type":"ListItem","position":2,"name":"Datahub: Open Source Data Lake with Pardhu Gunnam and Mars Lan"}]},{"@type":"WebSite","@id":"https:\/\/softwareengineeringdaily.com\/#website","url":"https:\/\/softwareengineeringdaily.com\/","name":"Software Engineering Daily","description":"","publisher":{"@id":"https:\/\/softwareengineeringdaily.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/softwareengineeringdaily.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/softwareengineeringdaily.com\/#organization","name":"Software Engineering Daily","url":"https:\/\/softwareengineeringdaily.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/logo\/image\/","url":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2022\/01\/cropped-logo-new.png?fit=296%2C139&ssl=1","contentUrl":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2022\/01\/cropped-logo-new.png?fit=296%2C139&ssl=1","width":296,"height":139,"caption":"Software Engineering Daily"},"image":{"@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/twitter.com\/software_daily"]},{"@type":"Person","@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/822f06fe7d6f895baba29a9c0a3aa6c8","name":"SE Daily","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/b92f4cf3dc4d94f73834f83e2a22a372?s=96&d=retro&r=pg","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/b92f4cf3dc4d94f73834f83e2a22a372?s=96&d=retro&r=pg","caption":"SE Daily"},"description":"The SE Daily podcast.","sameAs":["https:\/\/softwareengineeringdaily.com"],"url":"https:\/\/softwareengineeringdaily.com\/author\/erikawho\/"}]}},"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2021\/03\/DataHubhero.png?fit=1300%2C452&ssl=1","jetpack_shortlink":"https:\/\/wp.me\/p7GuoD-2Lg","_links":{"self":[{"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/posts\/10618"}],"collection":[{"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/comments?post=10618"}],"version-history":[{"count":0,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/posts\/10618\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/media\/10682"}],"wp:attachment":[{"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/media?parent=10618"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/categories?post=10618"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/tags?post=10618"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}