layout | title | chapter | part | author | featured | permalink | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
chapter |
Beyond Open Data: The Data-Driven City |
15 |
4 |
|
true |
/part-4/beyond-open-data-the-data-driven-city/ |
- 作者:麥克.富勞爾 (Michael Flowers)
- 譯者:Sandra Chiu, 2014/3/8
資料驅動的城市,意謂著能更有效地傳遞城市的核心服務,資料驅動主要不是技術上的挑戰,而是組織方向與領導上的挑戰。
學校講座、管理顧問,以及華爾街日報的所有報導都開始把重點放在所謂的「巨量資料」;一般對「巨量資料」的定義是:比我們過去慣用之分析還要更加龐大的訊息組合;是經常性產生的、由機器產生的,且通常包括地理位置標示。巨量資料的應用通常是事後觀點,其重點在於:資料的數量、如何存放資料,並且普遍認為是「越多越好」。現實狀況是,巨量資料帶來了希望,但它不應該和「資料驅動」混淆。
在巨量資料的討論中,往往會忘了把重點放在結果,因為它通常是事後觀點。我們擁有大量消防栓的資訊,但是唯有當它能指向某個起火點時才有價值,資料本身並不值錢。 在一個CSV文件中收集有關交通模式的資訊本身並沒有幫助;但當它被用來形成交通功能圖,而且城市規劃者可運用這些資訊來重新設計相關的服務模式時,資料才變得更有價值。 因此,真正重要的並不是CSV文件、地圖、或交通模式,而是結果:重要的是能運用這些資訊來改善交通、減少通勤時間、減少交通事故、改善城市的空氣品質、建構人行道或自行車道以減少車輛對行人或自行車騎士所造成的意外;讓我們擁有更快捷、更潔淨以及更安全的生活品質。
如果你在尋找管理完善、專注資料驅動的機構,不必捨近求遠,美國城市就是最好的代表。市政府提供了現代生活的骨幹服務,包括我們每天早上刷牙的水、上班時搭乘的公車、捷運與道路、清潔街道與綠化公園的團隊、提供孩子教育的學校,以及保障人民安全的警察與消防隊。我們看到愈來愈多的美國人選擇居住在城市。 受經濟和文化機會所吸引,很多美國人和移民喜歡搬入城市中。他們並不期望擁有寬敞公寓或豪華的通勤環境,事實上,他們常需犧牲居住或交通品質。他們不斷湧入城市,只因為他們追求在都市生活。 這股城市遷徙潮對城市基礎建設帶來更嚴峻的挑戰,包括:供水、污水處理、消防、公安、住房、醫療、教育、公園…等。值此之時,城市卻沒有足夠的資源來滿足快速成長的需求。 因應過去十年的經濟狀況,城市面臨了較低的人均稅收,這意味著市長與政府管理階層被迫多做事少花錢。實務上也就表示需要跳脫既有的體系與流程,找出新方法來得到更佳的結果。
資料驅動的城市是能智慧地運用資料來提供更佳核心服務的城市。 透明、開放的資料,以及創新都是現代城市象徵的重要部分;尤其像專注於其技術領導地位的紐約市。資料驅動的城市真正關切的是更有效地提供城市核心服務:更聰明的、考量風險的資源配置、行政機構間資訊能彼此有效分享做出更佳決策,且能有效整合運用於行政機構第一線員工現行的每日作業中。資料驅動主要不是技術上的挑戰,而是組織方向與領導上的挑戰。
2011年紐約市連續發生的公寓火災引發我們對資料的專注,在這個案例中,我們已經擁有了可以挽救生命的相關資料。
2011年春天,兩起發生於布朗克斯和布魯克林的公寓火災事件中,因為不安全的居住條件造成五人死亡。這類火災不僅僅是一個獨立事件,當很多人居住在不安全的公寓環境時,例如:使用攜帶式的烹飪設備、有問題的配電、不足夠的消防逃生通道…等,毁滅性的大火將吞噬人命。這類狀況在人口稠密的城市是很常見的事,例如紐約每年就會收到超過20,000筆市民投訴,懷疑其住宿環境的安全性。
紐約收集了市內建築物大量的相關資訊。我們知道建築物於何時建造、如何建成、是否已供水、是否有人居住;我們也知道建築物是否符合環保法規。
每天我們透過311接到超過30,000件來自於紐約市民包含有具體地點的服務需求(投訴),我們知道每棟建築物附近的資訊:那個路段發生過多少通911求救電話、道路施工是否已完成、路口是否有意外、以及那附近有什麼類型的企業。
針對這兩棟發生火災的建築,在火災發生前,城市擁有其繳稅狀況、投訴狀況、衛生違規、建築牆體違規等訊息。我們可以在火災帶來嚴重災害前即對這些建築物掌握足夠的資訊嗎? 我們可以決定哪些訊息對於災難所造成的結果是最有價值的預測因子? 我們市長辦公室資料分析小組開始著手回答這些問題。
不論是25,000人的小鎮或是紐約市長,提供安全、充份、負擔得起的住屋是各地領導者的優先重點。每年,越來越多的人搬到紐約市,因為如此,住屋需求增加,租金持續漲價,民眾在尋求合適的住屋時愈來愈困難。
在這樣的現象下,城市持續興建新的合宜住宅,以及維持現有大型住屋建築系統。然而,沒良心的業主往往利用這種市場需求,提供不合標準的簡陋公寓;他們無視消防安全將公寓進行隔間、他們把單人房加鎖,像旅館一樣出租給家庭、他們將車庫改裝為半套衛浴,用膠帶將門封住後出租、他們在很容易出現一氧化碳中毒和鍋爐爆炸意外的地下室鍋爐旁加床出租。一般而言,他們不但超量使用,且不具備良好的公共衛生與安全環境,市政府將這類不合標準的公寓歸類為「非法改裝」。
紐約市建築規範有一個主要的目標:安全。該規範不是憑空創造的,它經歷過這個城市中數百年間各類民事案件,因應過各類災難事故千鍾百煉而成。消防逃生通道、空間大小、地下室居住等相關規則,都是為了避免紐約人民死於建築意外。紐約市府有一群強力執行建築相關規範的督察團隊,這群人會於施工過程進行檢查,並於建造完成後持續監督;這群人是訓練有素的專業人員。當發現非法改裝案例時,他們會立即執法,確保這個地方可立即調整為符合安全的環境,或要求居民遷出避免受到傷害。每天都有新的居民遷入城市,在需求大增下,屋主很容易佔這些人便宜,市府必須解決這些不斷增加的非法改裝空間問題。
紐約市內非法改裝的最大情報來源是透過人民撥打311電話(或利用網站或手機應用程式)進行回報,我們在市內擁有數以百萬計的耳目,每天我們都可得到超過30,000筆以上的「智慧情報」。通常有些情報具有立即且直接的價值,舉例來說:因為你可以直接看見路燈是否閃爍故障,所以當人民回報路燈故障時,我們能夠立即派員更換燈泡,街道就可以恢復光亮。
然而,非法改裝則要複雜得多。通常投訴的人並沒有直接進入該空間,他們只是根據他們在建築物外面看到進出的人員、路邊的停車數量,或是該棟建築產生的垃圾量所做的初步判斷。遺憾的是,來自於311對非法改裝的投訴只有百分之八是高風險的非法改裝。
非法改裝是最最糟糕的,因為這是人民最可能失去生命的地方;然而當我們接到投訴,派員到現場檢查時,92%並沒有什麼安全上的嚴重風險。
並不是說這92%是毫無價值的,當我們派員到違規行為較不嚴重的場合時,所得到相的關資訊仍能幫助我們建立該地的特性概況。
不過,我們檢查員的人力與時間有限,我們真正想做的是如何能從311的非法改裝投訴案件中,篩選出那些應立即派檢查員出動的8%嚴重違規行為案件。
彭博市長是全國最重視資料驅動市政領導之市長,感謝他十二年來的帶領,讓我們有足以建立重大災害風險模型的資料。透過歷史分析(過去非法改裝的違規案件),挑出這些違規地點間的相似特性,我們可以針對311進來的每個非法改裝投訴進行分析,運用風險模型及我們對該建築物掌握的資訊進行快速分析,預測該投訴的嚴重性以及是否需立即派員處理。
重要的是,雖然我們的團隊現在已經發展到使用複雜的工具和資料,但剛開始這個專案時,我們僅用了幾台舊電腦和超過36,000行資料就會出問題的Microsoft Excel軟體。但即便是使用一般商用電腦上都具備的簡單工具,一個具有才華的年輕分析師就可以在311的投訴案件中找出我們所需的關連分析。
建築部(DOB, Department of Building) 對非法改案投訴案件的篩選經驗讓我們了解要去取得行政機構的資料以及賦與它們意義有多麼困難,尤其是同時分析來自於不同行政機構的資料庫。
大型組織內往往是獨立運作,甚至有些組織的這個問題比城市還嚴重。以紐約市為例,我們有超過四十個不同的行政機構、超過290,000名員工。傳統上,這些機構都只關切其本身的責任(治安、消防與救災、健康…等),通常是獨立運作,且其相關資料皆在各自的防火牆內。 即便是一些特殊專案,有來自於各個機構的分析師進行跨功能組織的分析專案時,資料共享也僅是一次性的,且只允許於特定時間內進行分析,沒有建立持續性評估績效與解決方案間關聯的長期資料合作。 「資料驅動」有一半的努力在於連接資料庫,這是組織的挑戰,而不是技術挑戰。
資料蒐集和資料連接的重要差別就是:資料蒐集是基於各服務領域的實際運作,我們的分析團隊可取得非常實際的資料,像暴風後倒下的樹木數,而針對蒐集到的資料加以分析則是我們的工作;例如,我們取得公園部門決定如何處理這些樹木、如何記錄其對應行為的資料,但不會讓資料蒐集成為重要業務運作的阻礙。若以「分析」為理由來改變資料蒐集方式可能會成為一個政治性問題,至少會面需重新訓練第一線員工的組織性問題。
我們並不是不斷地創造與蒐集新的資料,而是根據已蒐集到的資料,持續和行政機構進行諮詢與討論,直到改變和調整現況。幸運的是,紐約市在過去十年中已邁向企業化指標報告,且已有大量的資料可用。 在彭博市長的領導下,所有行政機構的年度績效達成皆需直接向紐約人民報告,這些目標的確是很重要的,但我們團隊對於可追踪績效的基礎資料更感興趣。
資料連接(Data connection)是很不一樣的。過去,當公園處在週三移除了一棵倒在人行道上的路樹,運輸處在週四去修復人行道時,我們沒有辦法將這兩個事件的資料關聯性連接起來。第一個問題是:這些資料並沒有被蒐集起來;第二個問題是,即使我們有這些資料,我們沒有一個明確的方式將它們的關連性連接起來。每個行政機構都有自己運作上的實務需求,所有資料的設計與建立都有其存在之合理目的,然而有時這也造成不同行政機構的資料很難被彼此連接。一個部門可以使用某個地理資訊系統 (GIS) 標示被擊落的樹木位置,而另一個部份可能用其標示十字路口位置。
對我們來說,我們發現自治區塊/建築編號(BBL / BIN)是紐約市大多數行政機構用來做為標示位置的標準作法,雖然並非所有行政機構都通用。但我們一位分析師羅塞塔.史東(Rosetta Stone)表示我們可將取得的任何一種地理相關資料(例如:地址、交叉路口…等)以一個特殊的地理編碼軟體將其對應到最近的自治區塊/建築編號(BBL / BIN),這樣我們就可以將城市運作的智慧鏈結與建立起來。以這個標示特定地理位置的案例中,只要專注於不同資料間共同特性,我們就可以將過去從未被建立連結的不同資料加以鏈結。
在重大問題解決的應用上,資料整合是很重要的,在關連性分析上,擁有愈多的資料就愈容易建立風險篩選機制。以非法改裝的案例而言,最重要的兩項資料是:該建築的建築稅(property taxes),以及是否已執行銀行貸款抵押拍賣。這兩項資料來自不同來源,一個是紐約市財政局,另一個是法院行政辦公室(貸款違約記錄);持續取得這類資料對建立篩選機制的有效性是必要的。
連接資料並對其進行分析的能力是強大的,但仍有賴行政機構提供資料;當然我們可以依據專案特性要求行政機構配合,但曾參與過團隊工作的人就知道,這是最差的作法。這些機構是為了提供人民各類公共服務,而我們真正在做的就是幫助他們更有效地提供城市服務。因此,我們把這些行政機構當做客戶,要把我們的分析服務賣給這些行政機構。我們針對他們所面對的問題提出解決方案,以便讓他們的工作更輕鬆,這一切都是為了他們,而且也應該是如此。他們是讓這個城市每天都安全、乾淨的人;若能展現我們在不需花他們太多力氣下就能幫助他們把工作做的更好,他們一定願意和我們合作。只要能把我們服務的價值回饋給他們,從無例外,我們都可以得到所需要的。
值得注意的是,我們仍然有很多城市相關的資料還沒取得;我們還沒有紐約市教育局或內部員工管理系統中的資料、沒有污水處理廠廢水中微粒物質的資料、也沒有某個街道上空氣中懸浮微粒的數量。請記住,並不需要拿到所有資料才能開始,相反地,你只需要一個理由來收集並連接你所需要的資料。當我們需要污水處理廠廢水中微粒物質的資料時,我們會和環境保護局接洽然後才開始收集這些資料。但,在那之前,我們會先從已擁有的資料開始。一個有合理目的,且以專案型態來進行資料收集與建立連接是最好方式。
當需要從機構收集訊息時,我們會請求授權進入他們的IT系統,且共享他們所有的訊息。他們沒有義務一定要同意,但實際上他們通常會同意。有兩個原因:首先,資料交換讓他們也有機會獲得其他機構的訊息,這樣他們就不必在每個月的第一個星期二打電話給另一個城市機構的IT部門,要求查詢一次性的資料,透過我們的資料共享平台,他們就可以自動取得資訊。其次,更重要的是,因為我們可以幫助他們,他們就願意和我們分享他們的資料。
資料若不是為了特定的目標而產出,就沒有價值,所以也不需要一群固定的分析團隊。一個好的分析團隊不是去探索新的問題來解決,而是與現場人員針對既有問題找出解決的方式,讓他們的工作更有效,卻不造成他們的工作負擔。
對機構來說,尤其是在機構內的員工,那些實際執行工作、知道所有服務細節的員工。這些人就是可以提供所觀察到最真實、最好的訊息,以及和我們一起共同解決問題的團隊。此外,當通過分析找到解決方案後,這些人也是未來要實際執行的人。讓他們加入分析與共同找出解決方案的團隊中,對於提供更有價值的服務來言是極其重要的,而讓他們加入的最好方法就是針對實際會影響到他們每天生活的問題來下手。
以建築檢查員的案例來說,我們針對投訴的重要性智慧化自動排序工單。建築檢查員具備很多的專業經驗,他們會依據自己的經驗對投訴案件加以判斷,分辨出哪些是最糟的狀況。然而,隨著越來越多的非法改裝的建築物數量、越來越少的檢查人力,以及越來越多的311投訴,用人工評估這些投訴已經成為一項繁重的挑戰。當我們利用風險篩選模型排出優先順序時,我們並不是不管那些檢查員的過往經驗,而是利用自動化方式先幫他們過濾一遍,讓他們能更輕鬆;他們依然可以人工再去讀取這些投訴案件,並根據他們的知識經驗重新排序,我們只是先提供智慧化排序清單而已。
我們可以整天和這些機構討論分析方法的好處,但他們真正關心的是結果。我們有一個重視投資回報率的市長,一個以投資回報率為核心的預算辦公室,以及各個重視投資回報率的機構首長。如果我們要求他們投入時間和提供資料,來提高他們的服務品質,則我們就應該提供更好的服務;最起碼,我們應該衡量各項服務水準的變化,了解會帶來的影響。
對成果的評量可能需要重新思考新的指標。例如:對建築部而言,非法改裝風險篩選模型的目標就是降低因火災或結構坍塌所造成的死亡人數。然而,現實情況是,因行政機構的專業表現,即使像在紐約這麼大的城市中,這些事件也已經十分少見;它可能很難從這麼小的基準點上達到卓越的改善成果;因此,我們必須重新思考評量成果的領先指標。 在建築發生災難性事故的事件中,“驅逐令” (vacate orders)是一個領先指標。請記住,在非法改裝的案例中,我們的建築檢查員會去執行所有311投訴,他們遲早會將所有的非法改裝投訴案件進行處理與修復回報。所謂的重新排序,並不會改變被發現的非法改裝案件總數。實際上最重要的是留意“盡快處理”部份,而不是“稍後處理”的部份。在非法改裝有火災風險的情況下,我們在一個投訴進來後三天內處理或是三十天後處理,就有很重大的差異。
經過投訴的重新排序,提昇了我們找到最嚴重狀況的速度,縮短了我們對於最危險地方的回應時間,我們降低讓居民處於危險狀況的天數,我們計算火災風險天數的縮短比率。
因為此計畫的成功,在我們接下來的管理報告中,建築部將增加以風險為基礎,與以成果為基礎的指標作為關鍵績效衡量指標。我們專注在這個分析專案中什麼是最重要的,並以此做為我們衡量績效的關鍵:我們會追蹤人們處於火災風險中天數的縮短比率。
分析團隊的最大挑戰就是如何將洞察觀點轉換成行動。洞察是很有力量的,但若現場作業無法改變,則洞察是毫無價值的。因此,現場作業改變愈小,愈不會帶給現場工作人員麻煩的「輕足跡策略」(lightest footprint) 讓洞察分析結果更有可能落實。
要了解哪些改變行動是破壞性的,而哪些不是,分析團隊需要掌握第一線人員真實的處理方式。當我們和行政機構合作時,我們會暗中觀察,了解其真實工作方式。親眼看到第一線人員如何完成工作,和在會議中或文件中得到資訊的差異很大,因此,這是流程中非常重要的一個步驟。 而且我們很快了解到,對於任何改變前線工作的方式,其成效都要打折扣,要在新流程中重新培訓前線員工或扭轉其運作方向會面臨很大的組織抗拒,新培訓和流程很難啟動。即使是新的表格都會令人不滿而遇到阻礙。
我們的觀念很簡單 – 「輕足跡」意味著該解決方案是由第一線員工提出。如果我們的任務是要調整調查的順序,我們會讓調查任務指派系統自動產生,也就是說在指派給檢查員之前,系統即已經排定其優先順序。如果我們的解決方案是將以前不相關的兩個訊息連接,並產出新的資訊,我們會確保每一項訊息都是由目前已有的資料產出,而不是用一種不同的、分離的方法。這作法聽起來簡單,但卻很容易出錯,不要改變第一線的工作流程,而是改變產出。
各式各樣關於巨量資料的話題憑空產生,但真正能創造效益的必定來自於年復一年的勤奮工作。與一般企業或政府的改造過程相同,需採取漸進性的步驟。
分析並不神奇,也不一定是複雜的。實際上,分析意味著智慧,智慧代表更好的資訊、可以幫助我們做出更好的決策。,例如:自動蒐集與分析資訊,進行工單排序;但要記住最重要的一點:我們並沒有改變作法,我們只是讓作法更有效、更順暢。
一個有效的分析專案是去發現與了解看不見的觀點,讓結果說話。
當我2009年底加入彭博市長團隊時,小房間內只有我一個人。剛開始我透過打電話、閱讀組織圖和資料表格、上網瀏覽市府的開放資料頁面,看看有什麼可利用的、或是在我能運用的時間內,不論上班或下班時段拜訪每一個辦公室,看看發生了什麼事情。半年後我聘雇了我的第一個分析師,一個非常親切友善,大學主修經濟學,剛畢業的社會新鮮人,而且曾連續三年贏得棒球聯賽。我們嘗試了一些不同的專案,雖然沒有什麼結果,但卻讓我們得到極其寶貴的經驗,讓我們學習到如何協調運用各類不同的城市資料以得到我們需要知道的事。 直到2011年春季,在我加入這個計畫後將近一年半之後,我們才交出第一件可操作的洞察觀點。在那之後兩年內,我們成為政府運作的核心,在全市範圍內實踐以分析為基礎的系統來建構有關安全、緊急回應、救災與修復、經濟發展,以及稅收執行,並且我們才正開始擴大規模。
這不是什麼必勝主義,這件事相當不容易;我的辦公桌上有段羅斯福的名言,從第一天起,我每天都一遍又一遍的讀它。
『不是那些吹毛求疵的人、也不是那些指出強者為何跌倒的人,唯有苦幹實幹的人才有可能做的更好。榮耀永遠依附在戰場上流血流汗、埋頭苦幹、屢敗屢戰的強者身上。辛苦耕耘的過程中必定經歷過挫敗,沒人可以保證在為理想犧牲奉獻後,真的能嘗到甜美的果實。勇者清楚了解在最後功成名就時的成就感,亦知道即便失敗了,至少都曾勇敢地為夢想奮力向前。不問成功失敗,只求盡了全力,勇者的身上總是熱血沸騰,看不見一絲絲怯弱的靈魂。』(羅斯福,1910)
我想要強調的是,你必須開始同時牢記下列經驗教訓,我們學到:
- 你不需要大量的專門人才。
- 你不需要大量的高階技術。
- 你不需要“完美”的資料(但你需要完整的資料)。
- 你必須有執行上強而有力的支持。
- 你必須跟資料背後的人對談,看看他們所看到的,經歷他們所經歷的。
- 你必須專注在產出客戶可以執行的洞察觀點與建議,他們可以在既有行政作業影響程度最小的狀況下立即使用
最重要的是,你必須在保有彈性度下持續的提供高品質產品。在紐約市資料分析計畫中,我們歡迎務實的、有創造力的問題解決者,空談者就不必了。最後你要記住,永遠記住這一點:在任何時候,這一切努力都是為了讓你的城市和人民更美好。 只要深入去發掘、去了解、以及去做,你將會對你的發現感到驚訝。
作者簡介 邁克爾•富勞爾是紐約市開放平台的總監兼首席分析官。在加入彭博市長團隊前,富勞爾先生是美國參議院第110屆和第111屆國會常設調查小組委員會的律師。他曾帶領過獲得兩黨支持的濫用海外避稅天堂的調查案件、商業銀行和政府機構在抵押貸款支持證券化市場的失敗投資案,以及北韓政府的金融交易詐欺案。2005年3月至2006年12月間,富勞爾先生是司法部於伊拉克巴格達政權罪聯絡辦公室的副主任,負責薩達姆•海珊(Saddam Hussein)和他政權下高級成員的相關調查與審判。富勞爾先生是畢業於費城天普大學法學院的高材生。
References參考文獻
- Roosevelt, Theodore (1910). “Citizenship in a Republic.” [Speech] Retrieved from http://www.theodore-roosevelt.com/trsorbonnespeech.html
Being a data-driven city is about more efficiently and effectively delivering the core services of the city. Being data-driven is not primarily a challenge of technology; it is a challenge of direction and organizational leadership.
College seminars, management consultants, and whole sections of the Wall Street Journal have all started to focus on something called “big data.” The general definition of big data that’s evolving is that it’s an exponentially larger set of information than we’re accustomed to analyzing, generated by machines, produced frequently, and often tagged with geo-location. The applications of big data are often an afterthought, while the conversation focuses on the quantity of data, how we’ll warehouse it, and assumptions along the general ethos of “more is better.” The reality is that big data holds promise, but it should not be confused with being data-driven.
A focus on outcomes is often lost in the discussion of big data because it is so frequently an afterthought. We have a huge fire hose of information, but even a fire hose is only valuable when it’s pointed at a fire. Data by itself is not inherently valuable. Collecting information about traffic patterns in a CSV file is not in itself helpful; the data becomes more valuable when it is used to form traffic-enabled maps and when city planners use the information to redesign traffic patterns. However, what really matters is not the CSV file, the map, or the traffic patterns, but the outcomes: using data to improve traffic and cut down on commute time, reduce automobile traffic and improve our air quality in the city, create crosswalks and bike lanes that decrease the incidents of car and truck accidents with pedestrians and cyclists, and allow us to live faster, cleaner, and safer lives.
If you’re looking for well-managed, focused, and data-driven institutions, look no further than the major American cities. City governments provide the services that are the backbone of modern life: the water we use when we brush our teeth in the morning; the roads, buses, and subways that take us to work; the teams that keep our streets clean and our parks green; the schools where are children are educated; and the police and fire forces that keep us safe. Increasingly, we see that Americans are choosing to live in cities. Attracted by the economic and cultural opportunities, Americans and immigrants are pursuing their dreams right alongside hundreds of thousands, if not millions, of fellow citizens. They’re not drawn to spacious apartments or luxurious commutes—in fact, they’re often making trade-offs on housing and transportation. They’re moving because they are committed to an urban life.
This great urban migration is placing even higher levels of demand on basic city infrastructure: water, sewer, fire, police, housing, healthcare, education, parks, and so on are all in higher demand. Meanwhile, cities have even fewer resources to meet those needs. In response to economic conditions of the last decade, cities have witnessed tax revenues that are lower on a per-capita basis, which means that mayors and city leadership are forced to do more with less. In practice, that means finding new ways to get even better outcomes out of our current systems and processes.
A data-driven city is a city that intelligently uses data to better deliver critical services. Transparency, open data, and innovation are all important parts of the modern civic identity, especially in a city like New York, which is focused on strengthening its position as a tech leader. However, being a data-driven city is really about more efficiently and effectively delivering the core services of the city: smarter, risk-based resource allocation, better sharing of information agency-to-agency to facilitate smart decision-making, and using the data in a way that integrates in the established day-to-day patterns of city agency front line workers. Being data-driven is not primarily a challenge of technology; it is a challenge of direction and organizational leadership.
For New York, a series of 2011 apartment fires helped galvanize our focus on the ability of data—in this case, the data that we already had—to save lives.
In the spring of 2011, a pair of house fires in apartment buildings in the Bronx and Brooklyn killed five people as a result of unsafe living conditions. This sort of fire is not an isolated incident. When many people crowd into unsafe apartment conditions, with portable cooking devices, questionable electrical wiring, and inadequate fire escape access, catastrophic fires will take lives. The occurrence is all too common in a densely populated city like New York. The City receives over 20,000 citizen complaints a year from buildings suspected of being unsafe boarding houses.
New York collects an immense amount of information about every single one of our buildings. We know when and how buildings were built; we know if the building is receiving water service and is, therefore, inhabited; and we know if buildings are in good order based upon the location’s history of ECB (environmental complaint board) violations on quality of life issues. Every day, we receive over 30,000 service requests (complaints) through 311 from New Yorkers, which gives us more location-specific intelligence. We know even more about the neighborhood where the building is located: we know how often 911 runs are made to that block-face, if road construction is being done, if there are accidents in the intersections, and what kinds of businesses are on the block.
In the case of the fire in the two buildings, by the time they occurred, the City had information on tax liens, building complaints, sanitation violations, and building wall violations. Did we know enough about these buildings before the fire that should have raised a red flag? Could we determine which pieces of information are the most valuable predictors of catastrophic outcomes? Our team, the Mayor’s Office of Data Analytics, set to work to answer those questions.
Providing safe, abundant, and affordable housing is a priority for the leader of every community, from the mayor of a town of 25,000 to the mayor of New York City. Every year, more people move to New York City, and as they do, housing demand increases, the price of rent grows, and individuals are often in a bind as they search for affordable housing.
Because of this strain, the City continues to invest in constructing new affordable housing and maintaining our large system of affordable housing buildings. However, unscrupulous landlords often take advantage of the high demand by providing substandard apartments. They create these apartments by subdividing existing space, with disregard to fire exit access. They put deadbolts on bedroom doors in single-family houses and rent them out as hotel rooms. They put a half bath into a garage, seal the door with tape, and rent out the space. They put beds next to boilers in basements, which is an area that is prone to carbon monoxide poisoning and boiler explosions. In general, they allow for gross over-occupation of small spaces without sanitary conditions. The City classifies these substandard apartments as “illegal conversions.”
The New York City Building Code has one primary goal: safety. That code wasn’t created out of thin air; it has been created and refined with hundreds of years of civil enforcement in the city, often in response to catastrophic accidents. Rules around fire escape access, size of space, inhabitation of basements, etc., are all designed to prevent New Yorkers from dying in building accidents. The City enforces that building code with a team of building inspectors, who always examine buildings in the construction process and continue to monitor buildings as they mature. These inspectors are trained professionals. When they find an illegal conversion, they do a great job of enforcing the code by either ensuring that the space is immediately configured for safe living or vacating the space to get the residents out of the path of harm. With new residents moving to the city every day, though, and landlords willing to take advantage of them, especially those who are most vulnerable to exploitation, the City must address a constantly growing and changing stock of illegally converted living spaces.
The City’s single largest source of intelligence on illegal conversions is New Yorkers who phone in (or use the web or mobile app) to 311 with tips. We have millions of eyes and ears on the street, and every day, we get over 30,000 new pieces of intelligence. Often, that intelligence has immediate, direct value; when a New Yorker calls in a street light that’s gone out, we’re able to send a truck and replace the bulb. Almost every single one of those street light complaints is founded, meaning that the light is actually out. That makes sense because you can look at the lamppost and see if it’s shining or not. Seeing an illegal conversion is much more complex. The individual who makes the complaint often has no direct access to the space, and instead, they’re forming their hypothesis based on what they see on the outside of the building in terms of population flow in and out of the building, the number of cars parked on the street, the amount of trash generated by the building, etc. Unfortunately, only eight percent of the specific 311 illegal conversion complaints from the citizenry are actually high-risk illegal conversions.
Illegally converted housing spaces are the worst of the worst because they are the places where we’re most likely to lose lives. When we send out a building inspector to look at an illegal conversion complaint, ninety-two percent of the time, they get there and there’s nothing serious in terms of safety risk. That’s not to say that those ninety-two percent of complaints are worthless. They often send inspectors to places where less serious violations are found, and the very act of sharing intelligence on a location helps us build up the profile of the space. Still, we have a limited number of inspectors, and they have a limited amount of time. What we really want to do is sift through that pile of 311 illegal conversion complaints and find the eight percent of complaints that are the most serious. That’s where we should be sending inspectors immediately.
Thanks to twelve years of leadership by Mayor Bloomberg, the nation’s most data-driven mayor, we have no shortage of data from which to build a catastrophic risk model. By conducting an analysis of historic outcomes (past illegal conversion violations) and identifying similar characteristics in those locations, we were able to create a risk model that takes each inbound 311 illegal conversion complaint, pairs it with existing information we know about that building, and predicts whether or not the complaint is most likely to be founded, meaning there are severely bad living conditions.
It is important to note that while our team has evolved to use sophisticated tools and data, we started this project out with a couple old desktops and versions of Microsoft Excel that broke down after 36,000 rows of data. Using the rudimentary tools that are found on virtually every business machine, a talented young analyst was able to conduct the correlative analysis that told us what to look for in the 311 complaints.
By prioritizing the complaints that are most likely to be dangerous, we are remediating dangerous conditions faster, without any additional inspector manpower. That is a safety-based resource allocation plan.
The experience of the Department of Buildings’ illegal conversions risk filter demonstrated firsthand for us how difficult it could be to gain access to agency datasets and make sense of them, especially in the context of simultaneously analyzing datasets from different city agencies.
Large organizations are often stove-piped, and few organizations exemplify that problem more than cities. New York City, for instance, has over forty different agencies and over 290,000 employees. Traditionally, these agencies have focused on their chartered responsibilities (policing, fire prevention and response, health, etc.) often independently and kept data within their walls. Even on special projects, where analysts from multiple agencies conducted a cross-functional analysis, the data sharing was one-off and only allowed for a moment-in-time analysis. There was no ongoing data cooperation that allowed for performance measurement and solution iteration. Half of the effort to becoming data-driven is connecting the data, and that is an organizational challenge, not a technological one.
There is an important distinction between collecting and connecting data. Data collection is based upon the actual operation of services in the field. Our analytics team gets very tactical data, for instance, on the numbers of trees that fall down during a storm. It’s our job to work with the data that is currently collected. For instance, the Parks Department decides how to respond to a tree and how to record that information, and we take it, but we do not let data collection get in the way of critical operations. Using analytics as a reason to change data collection can become a political problem, and at the very least, it is an organizational problem of retraining the front line. Instead of constantly pushing for new data, we rely upon what is already being collected and consult the agencies over time as they change and modernize their practices. Fortunately, cities have moved toward business reporting metrics in the last decade, and there is already a lot of data available. Led by Mayor Bloomberg, all city agencies measure their performance against annual goals and report that performance directly to New Yorkers. Those goals are important, but what we’re really interested in is the underlying data that tracks performance.
Data connection is different. In the past, when the Parks Department removed a tree that fell down on a sidewalk on a Wednesday and the Transportation Department went to repair the sidewalk on a Thursday, we had no way of connecting those two pieces of data. The first problem is that they are not housed together. The second problem is that even if we had them together, we wouldn’t have had a clear way to connect them. Each agency has its own ontology of terms and data that have all been created through reasonable, rational evolution of service, but which sometimes make it nearly impossible to connect that data. One department may use a GIS identifier for the location of the downed tree, whereas another may refer to it by its cross streets.
For us, we found that BBL/BIN (borough block lot/building identification number), along with a specialty geocoding software program one of our analysts wrote, was the Rosetta Stone to connecting the city’s operational intelligence. For most city agencies, BBL and BIN are the standard way of identifying a location; however, they’re not used by all agencies, nor are they universally appropriate. However, we can take whatever geo data we have (an address, an intersection, etc.) and geocode it to the nearest BIN/BBL. By focusing on the common denominator, which is structures in specific locations in this case, we’re able to tie together datasets that have previously never been linked.
Having integrated data is important because of its application in stronger problem solving. The more information we have through which to run correlative analyses, the better we can form risk filters. In the case of the illegal conversion filter, two of the most important pieces of input are whether the building is current on its property taxes and whether banks have filed any mortgage foreclosures. Those two pieces of information come from two different sources—the New York City Department of Finance and the Office of Court Administration (mortgage default records), and their continued access is necessary to the ongoing effectiveness of the filter.
The capacity to connect data and analyze it is powerful, but it’s still dependent upon the agencies playing ball by giving us their data to examine. One way to get the data is to demand compliance with our project. Anyone who has ever worked on a team or in a business knows that demanding compliance is rarely the best solution. Instead, we have to sell our service to the agencies. The agencies deliver city services, and because what we really do is help them deliver city services more efficiently, we treat them as our clients. We work toward solutions to their problems in order to make their lives easier. It’s all about them, just as it should be. They are the ones who are actually keeping this city clean and safe every day, and if we can demonstrate that we’ll help them do their jobs better with very little effort and a very small footprint, they’ll want to partner with us. As a result, and without exception, we have never failed to get what we need in order to deliver this service back to them.
It’s important to note that even in our office, we still have lots of city data that is outside of our walls. We don’t yet have granular information from the New York City Department of Education or from internal employee management systems. We also don’t have data on particulate matter at the sewage treatment plants, the pollen counts on a given street, etc. Keep in mind that you don’t need everything to get started, and, conversely, you need a reason to collect and connect the information you ask for. When we have a project that requires particulate matter at the sewage treatment plants, we’ll reach out to the Department of Environmental Protection and collect it, but until then, we’ll work with what we have. A rational, project-based approach to data collection and connection is the best way to build success over time.
When we collect information from agencies, we’re asking for them to give us access to their legacy IT systems and share all of their information. They don’t have to say yes, but they do, for two reasons. First, by participating in the data exchange, they have access to the information of other agencies as well. They’re able to avoid picking up the phone every first Tuesday of the month and calling the IT department of another city agency and asking for a one-off query of information because they’re able to automatically access the information through our data sharing platform. Second, and more importantly, agencies like sharing their data with us because we help them.
Just as data is not valuable without a specific outcome in mind, neither is a centralized analytics team. Intelligently applied, an analytics team does not look for new problems to solve, but works with the teams in the field to solve existing problems in a way that makes their jobs more effective without burdening their work.
It is the agencies, and specifically, the employees at the agencies, who are on the ground and who understand all of the details of the service of delivery. These are the teams that can give us the best-observed information on what’s going on and how we can work to fix problems. Moreover, these are the teams that are going to implement whatever solution we find through our analysis. Having them on board is fundamentally important to actually delivering more valuable service. The best way to have them on board is to work on a problem that actually impacts their day-to-day lives.
In the case of the building inspectors, that was an intelligent way to automate complaint priority. The building inspectors have an enormous amount of professional experience, and when they are able to read complaints and compare it with their own experience, they’re able to identify those that are often the worst. With fewer and fewer inspectors, more and more illegally occupied buildings, and more and more 311 complaints, devoting the time to successfully risk assess those complaints one-by-one by hand has become an onerous challenge. When we use a filter to prioritize tickets, we’re not ignoring the experience level of those inspectors. Instead, we’re giving them a leg up by doing an automated first pass on the inspection priority, essentially applying their accumulated institutional knowledge in an automated fashion. They can still read and reorder based on their knowledge set, but we’re starting them off with an intelligent list.
With these agencies, we can talk about the benefits of an analytics approach all day, but what they really care about are the results. We have a ROI-driven mayor, a ROI-driven budget office, and leaders at all of the agencies that are ROI-driven. If we ask them for their time and their data to improve their delivery of service, we should deliver improved service, and at the very minimum, we should be measuring the change in levels of service in order to understand the impact.
Measuring results may require new ways to think about the metrics. The goal of the Department of Buildings illegal conversion risk filter is to reduce the number of deaths through fires and structural collapses. However, the reality is that due to the professional excellence of our agencies, those events are so rare, even in a city as large as New York, that it can be difficult to accurately measure the performance improvement from such a small dataset. Instead, we had to think about the leading indicators of outcomes.
In the case of catastrophic building incidents, “vacate orders” are a leading indicator. In the case of illegal conversions, remember, our building inspectors go out to all of the 311 complaints. Sooner or later, they are going to find all of the illegal conversions that have been reported and remediate that condition. When we re-prioritize the tickets, we are not altering the total number of illegal conversions that will be found. However, the important part is actually the “sooner” rather than “later” piece. In the case of illegally converted structures, which incidentally are at risk of fire, it makes a huge difference to the residents if we inspect the building three days after a complaint comes in or thirty days later. When we increase the speed of finding the worst of the worst by prioritizing the complaint list, we are reducing our time to respond to the most dangerous places, and we are in effect reducing the number of days that residents are living at risk. We calculated that as a reduction in fire-risk days.
As a result of the success of the program, in our next management report, the Department of Buildings will add two risk-based, outcome-based metrics as their critical indicators of performance measurement. This fundamental shift in how we measure performance is directly attributable to focusing on what is most important in this analytics project: we are reducing the amount of time that people are at increased risk of burning to death and that reduction in time is what we’re tracking.
The greatest challenge for the analytics team is moving from insight to action. Insight is powerful, but it’s worthless if the behavior in the field doesn’t change. Getting the analytics into the field is dependent upon creating the lightest footprint possible, so that the intervention doesn’t cause a headache to the worker in the field.
To understand what will or won’t be disruptive, the analytics team needs to get a firm grasp on the way that operations are handled by the front line. When we work with an agency on a project, we shadow them to understand how they actually do their job. Seeing the way that the work is actually done is often very different from how it’s described on paper or in a meeting and is an important step in the process.
Immediately, we discount any intervention that changes the way that the front line works. New training and processes are non-starters because of the immense organizational difficulty in effectively turning battleships and reorienting them around new processes. Even new forms are frowned upon, as they get in the way, or at least change the way, the fieldwork is done.
Our concept is simple—a light footprint means that the solution must be delivered upstream of the front line. If our task is to re-prioritize inspections, we build that automatically into the inspection assignment generation system, so that the assignment is already delineated with a priority level by the time it reaches the inspectors. If our solution is a technological fix that connects two formerly disparate pieces of information and delivers a new piece of information, we make sure that piece of information is being delivered right alongside currently reported data, not in a different, detached method. It sounds simple, but it’s so easy to go wrong. Don’t change the front line process; change the outcome.
While the buzz around big data seems to have been generated out of thin air, the outcomes associated with it will only come from hard work, with years and years of effort. Just as with any other business or government process, the steps are incremental in nature.
Analytics is not magic, and it’s not necessarily complicated. Analytics really means intelligence, and intelligence is better information that helps us make better decisions. To the extent that we can automate that information gathering and analysis, for instance, in automatically sorting the priority level of work orders, we’re streamlining the efficacy of the approach. The most important thing to remember, however, is that we are not changing the approach.
An effective analytics project is one that gets in and gets out sight unseen. Let the results speak for the project.
When I first joined the Bloomberg Administration at the end of 2009, it was just me at a cubicle, making phone calls, studying organizational charts and data schematics, surfing our open data page to see what was available, and visiting every office I could in my on and off time to see what was going on. It was six months before I hired my first analyst, a fresh-from-college economics major who had won his rotisserie baseball league three years in a row and was preternaturally affable. We tried a few different projects that didn’t end up going anywhere, but taught us extremely valuable lessons about how to make disparate pieces of city data work moderately harmoniously together to tell us the stories we needed to hear. It wasn’t until spring 2011—almost a year and a half after I started this project—that we delivered our first actionable insight. In the two years since then, we have become a central component of the administration’s approach to government, implementing citywide analytics-based systems for structural safety, emergency response, disaster response and recovery, economic development, and tax enforcement—and we’ve only just started to scale out.
This isn’t triumphalism. Moreover, it was far from easy. Tacked up over my desk since my first day is a quote from Teddy Roosevelt, and more days than not, early on, I found myself reading it over and over again.
It is not the critic who counts; not the man who points out how the strong man stumbles, or where the doer of deeds could have done them better. The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; who errs, who comes short again and again, because there is no effort without error and shortcoming; but who does actually strive to do the deeds; who knows great enthusiasms, the great devotions; who spends himself in a worthy cause; who at the best knows in the end the triumph of high achievement, and who at the worst, if he fails, at least fails while daring greatly, so that his place shall never be with those cold and timid souls who neither know victory nor defeat. (Roosevelt, 1910)
What I’m trying to stress is you have to start somewhere, while bearing in mind the following lessons we’ve learned:
- You don’t need a lot of specialized personnel.
- You don’t need a lot of high-end technology.
- You don’t need “perfect” data (but you do need the entire set).
- You must have strong executive support.
- You must talk to the people behind the data, and see what they see and experience what they experience.
- You must focus on generating actionable insight for your clients that they can immediately use with minimal disruption to existing logistics chains.
Above all else, you need to be relentless in terms of delivering a quality product, while remaining flexible in terms of how you do it. For New York City’s analytics program, pragmatic, inventive problem solvers are always welcome, but ideologues need not apply. Finally, you need to remember at all times that the point of all this effort is to help your city and its people thrive. Keep all this in mind. Just dive in and do it. You may be amazed at what you find.
Michael Flowers is Chief Analytics Officer and Chief Open Platform Officer for the City of New York. Prior to joining the Bloomberg Administration, Mr. Flowers was Counsel to the US Senate Permanent Subcommittee on Investigations for the 110th and 111th Congress, where he led bipartisan investigations into off-shore tax haven abuses; failures in the mortgage-backed securitization market by US investment and commercial banks and government agencies; and deceptive financial transactions by the North Korean government. From March 2005 to December 2006, Mr. Flowers was Deputy Director of DOJ’s Regime Crimes Liaison’s Office in Baghdad, Iraq, supporting the investigations and trials of Saddam Hussein and other high-ranking members of his regime. Mr. Flowers is a magna cum laude graduate of Temple University School of Law in Philadelphia.