{"id":8392,"date":"2019-11-28T02:00:52","date_gmt":"2019-11-28T10:00:52","guid":{"rendered":"http:\/\/softwareengineeringdaily.com\/?p=8392"},"modified":"2021-09-27T07:37:49","modified_gmt":"2021-09-27T14:37:49","slug":"ubers-data-platform-with-zhenxiao-luo-holiday-repeat","status":"publish","type":"post","link":"https:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/","title":{"rendered":"Uber&#8217;s Data Platform with Zhenxiao Luo Holiday Repeat"},"content":{"rendered":"<p><img data-attachment-id=\"2475\" data-permalink=\"https:\/\/softwareengineeringdaily.com\/2016\/04\/19\/googles-container-management-brendan-burns\/brendan-burns\/\" data-orig-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2016\/04\/brendan-burns.jpg?fit=175%2C175&amp;ssl=1\" data-orig-size=\"175,175\" data-comments-opened=\"0\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"brendan-burns\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2016\/04\/brendan-burns.jpg?fit=175%2C175&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2016\/04\/brendan-burns.jpg?fit=175%2C175&amp;ssl=1\" decoding=\"async\" loading=\"lazy\" class=\"alignright size-full wp-image-2475\" style=\"border-radius: 50%; border: 1px solid #000000; max-width: 175px; max-height: 175px;\" src=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2018\/05\/ZhenxiaoLuo.jpeg?resize=175%2C175&#038;ssl=1\" width=\"175\" height=\"175\" data-recalc-dims=\"1\" \/><\/p>\n<p><em>Originally published May 24, 2018<\/em><\/p>\n<p><span style=\"font-weight: 400;\">When a user takes a ride on Uber, the app on the user\u2019s phone is communicating with Uber\u2019s backend infrastructure, which is writing to a database that maintains the state of that user\u2019s activity. This database is known as a transactional database or \u201cOLTP\u201d (online transaction processing). Every active user and driver and UberEATS restaurant is writing data to the transactional data store.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Periodically, that data is copied from the transactional data system to a different data storage system, where that data can be queried for large-scale data analysis. For example, if a data scientist at Uber wants to get the average amount of miles that a given user rode in February, that data scientist would issue a query to the analytical data cluster.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Uber uses the Hadoop distributed file system (HDFS) to store analytical data. On this file system, Uber has a version history of all of the company\u2019s useful historical data. Trip history, rider activity, driver activity&#8211;every data point that is in the transactional database&#8211;but in a file format that is easier to query for large scale processing. This file format is known as Parquet.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Data scientists, machine learning engineers, and real-time application developers all depend on the massive quantities of data that are stored in these Parquet files on Uber\u2019s HDFS cluster. To simplify the access of that data by many different clients, Uber uses Presto, an analytical query engine originally built at Facebook. <\/span><\/p>\n<p><span style=\"font-weight: 400;\">Presto translates SQL queries into whatever query language is necessary to access the underlying storage medium&#8211;whether that storage system is an ElasticSearch cluster, a set of Parquet files, or a relational database. Presto is useful because it simplifies the relationship between data engineers and the application developers who are building on top of the data engineering infrastructure.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In today\u2019s show, Zhenxiao Luo joins to give an end-to-end description of Uber\u2019s data infrastructure&#8211;from the ingest point of the OLTP database to the OLAP data storage system on HDFS, to the wide range of data systems and applications that run on top of that OLAP data.<\/span><\/p>\n<h2><\/h2>\n","protected":false},"excerpt":{"rendered":"<p>Originally published May 24, 2018 When a user takes a ride on Uber, the app on the user\u2019s phone is communicating with Uber\u2019s backend infrastructure, which is writing to a database that maintains the state of that user\u2019s activity. This database is known as a transactional database or \u201cOLTP\u201d (online transaction processing). Every active user<\/p>\n","protected":false},"author":3,"featured_media":7913,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_newsletter_tier_id":0,"footnotes":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[1363,1081,2143,14],"tags":[2120,61,137,2119,343,256,40,2118],"jetpack_publicize_connections":[],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v21.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Uber&#039;s Data Platform with Zhenxiao Luo Holiday Repeat - Software Engineering Daily<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"http:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Uber&#039;s Data Platform with Zhenxiao Luo Holiday Repeat - Software Engineering Daily\" \/>\n<meta property=\"og:description\" content=\"Originally published May 24, 2018 When a user takes a ride on Uber, the app on the user\u2019s phone is communicating with Uber\u2019s backend infrastructure, which is writing to a database that maintains the state of that user\u2019s activity. This database is known as a transactional database or \u201cOLTP\u201d (online transaction processing). Every active user\" \/>\n<meta property=\"og:url\" content=\"http:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/\" \/>\n<meta property=\"og:site_name\" content=\"Software Engineering Daily\" \/>\n<meta property=\"article:published_time\" content=\"2019-11-28T10:00:52+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-09-27T14:37:49+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2019\/08\/uber.jpeg?fit=4974%2C2982\" \/>\n\t<meta property=\"og:image:width\" content=\"4974\" \/>\n\t<meta property=\"og:image:height\" content=\"2982\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"SE Daily\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@software_daily\" \/>\n<meta name=\"twitter:site\" content=\"@software_daily\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"SE Daily\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"http:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/#article\",\"isPartOf\":{\"@id\":\"http:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/\"},\"author\":{\"name\":\"SE Daily\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/822f06fe7d6f895baba29a9c0a3aa6c8\"},\"headline\":\"Uber&#8217;s Data Platform with Zhenxiao Luo Holiday Repeat\",\"datePublished\":\"2019-11-28T10:00:52+00:00\",\"dateModified\":\"2021-09-27T14:37:49+00:00\",\"mainEntityOfPage\":{\"@id\":\"http:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/\"},\"wordCount\":360,\"publisher\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/#organization\"},\"keywords\":[\"data infrastructure\",\"Hadoop\",\"HDFS\",\"OLTP\",\"Parquet\",\"Presto\",\"Uber\",\"Zhenxiao Luo\"],\"articleSection\":[\"All Content\",\"Data\",\"Exclusive Content\",\"Podcast\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"http:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/\",\"url\":\"http:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/\",\"name\":\"Uber's Data Platform with Zhenxiao Luo Holiday Repeat - Software Engineering Daily\",\"isPartOf\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/#website\"},\"datePublished\":\"2019-11-28T10:00:52+00:00\",\"dateModified\":\"2021-09-27T14:37:49+00:00\",\"breadcrumb\":{\"@id\":\"http:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"http:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"http:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/softwareengineeringdaily.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Uber&#8217;s Data Platform with Zhenxiao Luo Holiday Repeat\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#website\",\"url\":\"https:\/\/softwareengineeringdaily.com\/\",\"name\":\"Software Engineering Daily\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/softwareengineeringdaily.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#organization\",\"name\":\"Software Engineering Daily\",\"url\":\"https:\/\/softwareengineeringdaily.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2022\/01\/cropped-logo-new.png?fit=296%2C139&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2022\/01\/cropped-logo-new.png?fit=296%2C139&ssl=1\",\"width\":296,\"height\":139,\"caption\":\"Software Engineering Daily\"},\"image\":{\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/twitter.com\/software_daily\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/822f06fe7d6f895baba29a9c0a3aa6c8\",\"name\":\"SE Daily\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/b92f4cf3dc4d94f73834f83e2a22a372?s=96&d=retro&r=pg\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/b92f4cf3dc4d94f73834f83e2a22a372?s=96&d=retro&r=pg\",\"caption\":\"SE Daily\"},\"description\":\"The SE Daily podcast.\",\"sameAs\":[\"https:\/\/softwareengineeringdaily.com\"],\"url\":\"https:\/\/softwareengineeringdaily.com\/author\/erikawho\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Uber's Data Platform with Zhenxiao Luo Holiday Repeat - Software Engineering Daily","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"http:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/","og_locale":"en_US","og_type":"article","og_title":"Uber's Data Platform with Zhenxiao Luo Holiday Repeat - Software Engineering Daily","og_description":"Originally published May 24, 2018 When a user takes a ride on Uber, the app on the user\u2019s phone is communicating with Uber\u2019s backend infrastructure, which is writing to a database that maintains the state of that user\u2019s activity. This database is known as a transactional database or \u201cOLTP\u201d (online transaction processing). Every active user","og_url":"http:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/","og_site_name":"Software Engineering Daily","article_published_time":"2019-11-28T10:00:52+00:00","article_modified_time":"2021-09-27T14:37:49+00:00","og_image":[{"width":4974,"height":2982,"url":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2019\/08\/uber.jpeg?fit=4974%2C2982","type":"image\/jpeg"}],"author":"SE Daily","twitter_card":"summary_large_image","twitter_creator":"@software_daily","twitter_site":"@software_daily","twitter_misc":{"Written by":"SE Daily","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"http:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/#article","isPartOf":{"@id":"http:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/"},"author":{"name":"SE Daily","@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/822f06fe7d6f895baba29a9c0a3aa6c8"},"headline":"Uber&#8217;s Data Platform with Zhenxiao Luo Holiday Repeat","datePublished":"2019-11-28T10:00:52+00:00","dateModified":"2021-09-27T14:37:49+00:00","mainEntityOfPage":{"@id":"http:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/"},"wordCount":360,"publisher":{"@id":"https:\/\/softwareengineeringdaily.com\/#organization"},"keywords":["data infrastructure","Hadoop","HDFS","OLTP","Parquet","Presto","Uber","Zhenxiao Luo"],"articleSection":["All Content","Data","Exclusive Content","Podcast"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"http:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/","url":"http:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/","name":"Uber's Data Platform with Zhenxiao Luo Holiday Repeat - Software Engineering Daily","isPartOf":{"@id":"https:\/\/softwareengineeringdaily.com\/#website"},"datePublished":"2019-11-28T10:00:52+00:00","dateModified":"2021-09-27T14:37:49+00:00","breadcrumb":{"@id":"http:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["http:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/"]}]},{"@type":"BreadcrumbList","@id":"http:\/\/softwareengineeringdaily.com\/2019\/11\/28\/ubers-data-platform-with-zhenxiao-luo-holiday-repeat\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/softwareengineeringdaily.com\/"},{"@type":"ListItem","position":2,"name":"Uber&#8217;s Data Platform with Zhenxiao Luo Holiday Repeat"}]},{"@type":"WebSite","@id":"https:\/\/softwareengineeringdaily.com\/#website","url":"https:\/\/softwareengineeringdaily.com\/","name":"Software Engineering Daily","description":"","publisher":{"@id":"https:\/\/softwareengineeringdaily.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/softwareengineeringdaily.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/softwareengineeringdaily.com\/#organization","name":"Software Engineering Daily","url":"https:\/\/softwareengineeringdaily.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/logo\/image\/","url":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2022\/01\/cropped-logo-new.png?fit=296%2C139&ssl=1","contentUrl":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2022\/01\/cropped-logo-new.png?fit=296%2C139&ssl=1","width":296,"height":139,"caption":"Software Engineering Daily"},"image":{"@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/twitter.com\/software_daily"]},{"@type":"Person","@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/822f06fe7d6f895baba29a9c0a3aa6c8","name":"SE Daily","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/softwareengineeringdaily.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/b92f4cf3dc4d94f73834f83e2a22a372?s=96&d=retro&r=pg","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/b92f4cf3dc4d94f73834f83e2a22a372?s=96&d=retro&r=pg","caption":"SE Daily"},"description":"The SE Daily podcast.","sameAs":["https:\/\/softwareengineeringdaily.com"],"url":"https:\/\/softwareengineeringdaily.com\/author\/erikawho\/"}]}},"jetpack_sharing_enabled":true,"jetpack_featured_media_url":"https:\/\/i0.wp.com\/softwareengineeringdaily.com\/wp-content\/uploads\/2019\/08\/uber.jpeg?fit=4974%2C2982&ssl=1","jetpack_shortlink":"https:\/\/wp.me\/p7GuoD-2bm","_links":{"self":[{"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/posts\/8392"}],"collection":[{"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/comments?post=8392"}],"version-history":[{"count":0,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/posts\/8392\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/media\/7913"}],"wp:attachment":[{"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/media?parent=8392"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/categories?post=8392"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/softwareengineeringdaily.com\/wp-json\/wp\/v2\/tags?post=8392"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}